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Sensor  nodes  in  a  distributed  sensor  network  can  fail  due  to  a  variety  of  reasons,  e.g.,  harsh 
environmental  conditions,  sabotage,  battery  failure,  and  component  wear-out.  Since  many  wireless 
sensor  networks  are  intended  to  operate  in  an  unattended  manner  after  deployment,  failing  nodes 
cannot  be  replaced  or  repaired  during  field  operation.  Therefore,  by  designing  the  network  to  be 
fault-tolerant,  we  can  ensure  that  a  wireless  sensor  network  can  perform  its  surveillance  and  track¬ 
ing  tasks  even  when  some  nodes  in  the  network  fail.  In  this  paper,  we  describe  a  fault-tolerant  self¬ 
organization  scheme  that  designates  a  set  of  backup  nodes  to  replace  failed  nodes  and  maintain  a 
backbone  for  coverage  and  communication.  The  proposed  scheme  does  not  require  a  centralized 
server  for  monitoring  node  failures  and  for  designating  backup  nodes  to  replace  failed  nodes.  It 
operates  in  a  fully  distributed  manner  and  it  requires  only  localized  communication.  This  scheme 
has  been  implemented  on  top  of  an  energy-efficient  self-organization  technique  for  sensor  networks. 
The  proposed  fault-tolerance-node  selection  procedure  can  tolerate  a  large  number  of  node  failures 
using  only  localized  communication,  without  losing  either  sensing  coverage  or  communication 
connectivity. 
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1.  Introduction 

Wireless  sensor  networks  can  be  deployed  to  provide  continuous  surveillance  and  moni¬ 
toring  over  a  designated  area  of  interest  [2,7,19,22].  Many  wireless  sensor  nodes  have  low 
cost  and  small  form  factors  [1,2,7];  therefore,  they  can  be  deployed  in  large  numbers  with 
high  redundancy.  A  typical  example  of  such  low-cost  sensor  nodes  is  the  set  of  Berkeley 
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motes  from  Crossbow  Technology  [33].  Since  nodes  are  deployed  in  a  redundant  fashion, 
not  every  node  in  the  network  needs  to  be  continuously  active  for  sensing  and  communi¬ 
cation.  The  operational  lifetime  of  sensor  networks  can  be  increased  by  network  organiza¬ 
tion  schemes  for  topology  control,  where  only  a  subset  of  nodes  are  kept  active,  while  the 
other  nodes  are  kept  in  a  sleep  state  or  a  power-saving  mode  [14,27,31],  Fewer  active 
nodes  also  place  less  demand  on  the  limited  network  bandwidth. 

Since  a  wireless  sensor  network  should  ideally  perform  surveillance  tasks  in  an  unat¬ 
tended  manner,  it  needs  to  operate  as  long  as  possible,  even  when  many  sensor  nodes  fail. 
This  motivates  our  work  on  fault-tolerant  self-organization.  Most  recent  work  aims  to 
provide  fault  tolerance  in  the  deterministic  deployment  of  sensor  nodes  [9,17,21,24]. 
Much  less  attention  has  been  devoted  to  distributed  protocols  that  can  replace  failing 
nodes  in  the  network  with  spare  nodes.  Failing  sensor  nodes  result  in  coverage  loss  and 
breakage  in  communication  connectivity,  hence  there  is  a  need  for  a  distributed  node 
replacement  protocol  and  self-organization  scheme  that  designates  nodes  as  fault  toler¬ 
ance  (spare)  nodes.  Such  a  scheme  should  be  fully  distributed  such  that  it  can  be  scalable 
for  a  large  number  of  nodes.  It  should  only  require  localized  communication  to  select 
backup  nodes  for  fault  tolerance,  and  it  should  not  rely  on  a  centralized  server  to  identify 
and  replace  faulty  nodes. 

This  paper  presents  redundancy  analysis  and  a  distributed  self-organization  scheme 
that  ensures  communication  connectivity  and  sensing  coverage  when  nodes  fail,  either 
sequentially  or  simultaneously.  We  first  present  analytical  results  to  characterize  the 
extent  of  redundancy  needed  for  fault  tolerance.  We  then  describe  a  distributed  scheme 
that  achieves  fault  tolerance  by  selecting  fault  tolerance  nodes  that  can  replace  failing 
nodes.  The  proposed  distributed  approach  uses  only  single-hop  or  restricted-hop  neighbor¬ 
hood  information  to  select  fault  tolerance  nodes.  We  show  that  the  proposed  approach  pro¬ 
vides  communication  connectivity  and  sensing  coverage  even  when  up  to  Q  nodes  fail, 
where  Q  is  a  user-defined  parameter. 

The  paper  is  organized  as  follows.  In  Section  2,  we  briefly  describe  related  prior 
work.  In  Section  3,  we  present  the  background  and  assumptions  used  in  this  paper.  Section 
4  describes  fault  tolerance  for  communication  connectivity.  Section  5  addresses  fault  tol¬ 
erance  for  sensing  coverage.  We  present  simulation  results  for  the  proposed  distributed 
self-organization  technique  in  Section  6.  Section  7  concludes  the  paper  and  outlines  direc¬ 
tions  for  the  future  work. 

2.  Related  Work 

Energy-efficient  self-organization  in  wireless  sensor  networks  has  received  considerable 
attention  in  the  literature  [13,16,20,23,26,29],  Energy  considerations  have  been  used  to 
find  a  set  of  (active)  nodes  that  can  form  a  backbone  for  the  network.  Selection  of  these 
backbone  nodes  can  be  achieved  by  heuristics  described  in  [3,4,25,28]  based  on  the  con¬ 
cept  of  a  connected  dominating  set,  where  the  distributed  algorithm  proposed  in  [3]  has 
the  best  message  complexity.  The  selection  of  active  nodes  to  guarantee  both  sensing  cov¬ 
erage  and  communication  connectivity  has  been  studied  in  [14,27,31].  A  recent  approach 
distinguishes  connectivity  from  sensing,  and  determines  the  configuration  of  the  nodes 
with  both  communication  connectivity  and  sensing  coverage  as  considerations  [27], 

Fault-tolerance  in  distributed  sensor  networks  has  received  relatively  less  attention 

[9.17.21.24] ,  Problems  studied  include  the  characterization  of  sensor  fault  modalities 

[17.24] ,  faulttolerance  in  multiple-sensor  fusion  [21],  and  reliable  information  dissemina¬ 
tion  [9].  Recent  work  on  fault-tolerance  in  wireless  sensor  networks  can  be  categorized  as 
being  focused  on  fault  detection  [6,10,12]  or  fault-tolerant  operations  [15,30].  In  [10],  the 
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authors  present  various  fault  tolerance  techniques  at  different  levels,  including  the  physi¬ 
cal  layer  for  communication,  the  hardware  components  of  a  sensor  node,  system  software 
such  as  the  embedded  operating  system,  middleware,  and  application.  In  [12],  the  authors 
consider  faults  in  node  sensor  measurements  and  develop  a  distributed  Bayesian  algorithm 
to  detect  and  correct  such  faults.  [6]  also  addresses  a  similar  fault  detection  problem,  and 
presents  a  crash  identification  mechanism.  In  [30],  the  authors  show  that  a  sensor  network 
with  n  nodes  is  asymptotically  connected  if  each  node  is  directly  connected  to  at  least 
5.1774  log  n  neighboring  nodes.  [15]  shows  that  for  a  wireless  sensor  network  with  n 
nodes,  the  connectivity  probability  with  up  to  k  failing  nodes  is  at  least  e‘‘"  when  the  trans¬ 
mission  radius  r  satisfies  nnr2  >  In  u  +  (2k  -  1)  In  In  n  -  21n  k\  +  2a.  Recently,  in  [1 1],  a 
protocol  has  been  proposed  for  event  detection  in  sensor  networks,  which  is  able  to  handle 
both  natural  and  malicious  node  failures  in  sensor  networks.  However,  most  prior  work 
has  not  characterized  the  redundancy  necessary  for  fault  tolerance,  and  no  distributed  self¬ 
organization  protocol  has  directly  considered  this  issue. 

3.  Preliminaries 

3.1.  Assumptions 

The  discussion  in  this  paper  is  based  on  the  following  assumptions: 

1 .  The  ad  hoc  sensor  network  is  deployed  with  a  sufficient  number  of  nodes  such  that 
the  network  is  connected.  All  sensor  nodes  have  the  same  maximum  communica¬ 
tion  range  rc  and  maximum  sensing  range  rs. 

2.  We  represent  the  surveillance  field  by  a  2D  grid,  whose  dimension  is  given  as  X  x 
Y.  Let  Q  =  {gi,g2,---,gm}  be  the  set  of  all  grid  points,  and  m  =1  Q  1=  XY. 

3.  We  use  S  to  denote  the  set  of  n  sensor  nodes  that  have  been  placed  in  the  sensor 
field,  i.e.,  |  S  |  =n.  A  node  with  id  k  is  referred  to  as  sk(sk  e  S.  1  <  k  <  ri).  Let  df  be 
the  distance  between  the  grid  point  gt  and  the  sensor  node  sk.  In  a  graph  model 
G(V,  E )  for  a  set  S  of  nodes,  we  use  the  vertex  v  e  V  in  the  graph  model  inter¬ 
changeably  with  its  corresponding  node  s  e  S.  The  set  of  edges  E  denotes  the  con¬ 
nectivity  between  nodes. 

4.  We  model  sensing  coverage  using  the  probability  Pi  that  a  target  at  grid  point  gt  is 
detected  by  a  node  sk: 


k 


Pi  = 


if  df  <  rs ; 
otherwise. 


(1) 


where  a  is  a  parameter  representing  the  physical  characteristics  of  the  sensor.  The 
model  conveys  the  intuition  that  the  closer  a  location  is  to  the  node,  a  higher  signal-to- 
noise  ratio  is  expected,  resulting  in  a  higher  confidence  level  that  a  target  at  that  loca¬ 
tion  is  detected.  Areas  beyond  the  maximum  sensing  range  rs  are  then  considered  to  be 
too  noisy  for  the  sensor  node  to  determine  if  there  is  a  target.  The  sensing  model  is  only 
used  for  coverage  evaluation  during  active  node  selection;  alternative  sensing  models 
can  also  be  easily  considered.  Assume  that  .S',  is  the  set  of  nodes  that  can  detect  grid 
point  gf.  thus  the  detection  probability  for  grid  point  is  evaluated  by  Equation  (1)  as 

Pi(si)=i-lla-pf) 

sk  e^i 


(2) 
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3.2.  Coverage-  and  Connectivity-centric  Selection  of  Active  Nodes  for  Self-organization 

In  this  paper,  we  focus  on  the  fault  tolerance  problem  in  the  topology  control  of  ad  hoc 
sensor  networks.  We  assume  that  a  network  organization  scheme  is  provided  to  the  sensor 
network.  Network  organization  can  be  achieved  by  using  techniques  described  in 
[14,27,31],  which  select  a  subset  of  active  nodes  as  a  backbone  for  communication  con¬ 
nectivity  and/or  sensing  coverage.  The  failure  of  these  active  nodes  can  result  in  loss  of 
connectivity  and/or  loss  of  sensing  coverage.  We  use  Sa  (Ss)  to  denote  the  set  of  active 
(sleeping)  nodes  determined  by  such  active  nodes  selection  algorithms,  and  the  following 
discussion  assumes  that  Sa  and  Ss  have  already  been  determined.  We  consider  the  thresh¬ 
old  pth  to  be  a  parameter  underlying  a  successful  sensing  coverage  over  the  sensor  field. 
The  following  conditions  are  implicitly  satisfied:  1)  Vg,e  Q  and  <z  Sa ,  > plh\  2)  Vsk 

e  Sa  is  connected. 

In  [31],  we  have  shown  that  the  problem  of  selecting  a  subset  of  nodes  as  a  backbone 
for  both  sensing  coverage  and  communication  connectivity  is  A/iP-complete.  We  have 
also  presented  the  token-based  coverage-  and  connectivity-centric  active  node  selection 
(CCANS)  protocol  that  achieves  self-organization  with  a  subset  of  active  nodes,  which  are 
responsible  for  both  the  coverage  and  the  connectivity.  In  this  section,  we  first  review  the 
token-based  CCANS  protocol  for  energy-efficient  self-organization.  We  then  describe  the 
problem  of  providing  fault  tolerance  to  active  backbone  nodes  in  the  following  sections. 
The  proposed  fault-tolerant  self-organization  technique  is  general,  and  it  can  also  be  used 
with  other  self-organization  protocols.  CCANS  is  used  in  this  paper  as  a  vehicle  to  evalu¬ 
ate  the  proposed  method. 


3.3.  Token-based  CCANS  Protocol 

There  are  three  types  of  messages  used  in  this  protocol,  namely  HELLO,  STATE,  and 
UPDATE.  These  messages  contain  such  fields  as  tokenid  and  srcid  which  enables  the 
token  to  control  the  execution  of  the  sensing  coverage  evaluation  and  connectivity  check¬ 
ing.  There  are  three  possible  states  for  all  nodes,  namely  UNSET,  SLEEP  and  ACTIVE. 
Initially,  all  nodes  are  in  UNSET  state  with  their  tokenid  =  — 1,  i.e.,  no  token  has  been 
given  to  them  for  the  execution  of  the  CCANS  algorithm.  There  are  two  stages  in  the 
CCANS  protocol,  namely  Stage  1  for  node  sensing  coverage  evaluation,  followed  by 
Stage  2  for  node  state  and  connectivity  checking.  The  node  with  the  assigned  token  is 
referred  to  as  the  token  node  and  all  other  nodes  either  collect  messages  sent  from  the 
token  node  or  perform  no  action.  In  Stage  1,  the  current  token  node  evaluates  the  coverage 
within  its  sensing  area  versus  the  coverage  within  its  sensing  area  contributed  by  its  neigh¬ 
bors.  It  chooses  the  state  ACTIVE  if  its  sensing  area  is  not  fully  covered  by  its  neighbors, 
otherwise  it  chooses  the  state  SLEEP.  However,  this  state  decision  is  not  final  until  the 
connectivity  checking  and  coverage  re-evaluation  are  completed  in  Stage  2.  One  node  is 
pre-selected  as  the  start  node  by  the  base-station  to  initiate  the  execution  of  the  CCANS 
algorithm  for  finding  a  subset  of  active  nodes. 

The  token  passing  procedure  is  designed  to  reduce  the  execution  time  of  the  algorithm 
by  expanding  the  global  sensing  coverage  as  much  as  possible  [31].  Consider  an  arbitrarily 
chosen  node  sk.  sk  gets  the  token  for  execution  of  the  CCANS  algorithm  when  id(sk)  = 
tokenid.  If  tokensrc(sk)  =  -1,  then  sk  sets  tokensrcis  k)  =  srcid ;  this  is  set  only  once.  There¬ 
fore,  every  node  knows  its  token  source  and  is  able  to  pass  the  token  back  to  its  token 
source  when  it  completes  CCANS  Stage  2.  If  sk  is  the  start  node,  then  initially  tokensrcis k ) 
=  id(sk)  A  -1.  At  the  time  when  the  token  is  passed  back  to  sk,  if  sk  has  no  UNSET  neigh¬ 
bors,  it  executes  Stage  2  of  the  distributed  CCANS  procedure  to  find  its  own  final  state 


Redundancy  Analysis  for  Wireless  Sensor  Networks 


247 


decision;  then  the  distributed  CCANS  procedure  terminates.  As  an  example.  Fig.  1(a) 
illustrates  token  passing  for  an  example  sensor  network  with  four  sensor  nodes,  Sj,  s2,  s3, 
and  s4,  where  ,v ,  is  the  start  node.  The  steps  in  this  example  are  as  follows: 

(a)  Initially  all  nodes  are  in  UNSET  state  and  .v,  is  the  start  node. 

(b)  Sj  has  completed  CCANS  State  1  and  passes  the  token  to  ,v2. 

(c)  s2  has  completed  CCANS  Stage  1  and  passes  the  token  to  s3. 

(d)  ,v3  has  no  more  UNSET  neighbors  and  it  has  completed  CCANS  Stage  2,  there¬ 
fore  ,v3  passes  the  token  back  to  ,v2. 

(e)  s2  still  has  UNSET  neighbors  so  s2  passes  the  token  to  ,v4. 

(f)  ,v4  has  no  more  UNSET  neighbors  and  it  has  completed  CCANS  Stage  2,  therefore 
,v4  passes  the  token  back  to  s2. 

(g)  s2  has  no  more  UNSET  neighbors  and  it  has  completed  CCANS  Stage  2,  there¬ 
fore  s2  passes  the  token  back  to  ,v  | . 

(h)  Sj  has  no  more  UNSET  neighbors  and  it  has  completed  CCANS  Stage  2.  Since  ,y, 
is  the  start  node,  all  nodes  have  made  the  state  decision,  and  CCANS  terminates. 

Fig.  1(b)  shows  the  sequence  of  the  token  source  in  terms  of  node  id  during  the  execution 
of  the  distributed  CCANS  procedure  for  the  example  shown  in  Fig.  1(a).  The  CCANS  pro¬ 
cedure  requires  only  constant  rounds  for  message  exchange  in  both  stages  [31].  Let  A  be 
the  maximum  node  degree  in  the  graph  corresponding  to  the  sensor  network.  The  connec¬ 
tivity  checking  procedure  in  CCANS  has  a  time  complexity  of  0( A2)  per  node,  and  this  is 
carried  out  independently  by  each  node.  Since  the  sensing  coverage  evaluation  is  carried 
out  per  grid  point  for  all  nodes  in  the  neighborhood,  the  time  complexity  of  the  sensing 
coverage  evaluation  in  CCANS  is  CHjnA),  where  m  is  the  number  of  grid  points  represent¬ 
ing  the  sensor  field.  Therefore,  the  overall  time  complexity  for  the  CCANS  procedure  per 
node  is  0(mA  +  A2).  The  complexity  depends  only  on  the  maximum  degree  of  a  node  and 
the  grid  granularity  of  the  sensor  field.  As  shown  in  [31],  the  CCANS  protocol  always  ter¬ 
minates  and  achieves  self-organization.  The  completion  of  the  distributed  CCANS  proce¬ 
dure  can  be  easily  notified  to  the  base  station. 

3.4.  Fault-tolerant  Self -organization 

In  this  paper,  we  focus  on  fault-tolerant  self-organization,  where  both  the  sensing  cover¬ 
age  and  the  connectivity  are  preserved  with  support  from  the  designated  fault  tolerance 
(FT)  nodes  when  active  nodes  fail.  We  refer  to  this  as  the  fault-tolerance-nodes-selection 


(a) 


(b) 


i  s. 


Step  Token  Passing  Sequence  in  Terms  of  Node  Id 


CCANS  starts  with  sj  as  the  start  node. 
1  2 
1  -►  2  -»  3 
1  2  ^  3  2 

1 ^2^3^2^4 
1^2^3^2^4^2 
1— >-2— >3— >-2— >4— >2— >1 
CCANS  terminates. 


|  |  Current  token  node  F "j  Next  token  node  O  UNSET  node 

O  Node  that  has  finished  CCANS  Stage  1  •  Node  that  has  finished  CCANS  Stage  2 


/ 

9 

h 


FIGURE  1  (a)  Example  of  token  passing  in  the  distributed  CCANS  procedure,  (b)  Token 
passing  sequence  for  the  example  in  Fig.  1(a). 
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(FTNS)  problem.  The  proposed  distributed  FTNS  algorithm  is  executed  after  Sa  and  5,  are 
determined,  where  a  set  St  of  nodes  is  designated  to  be  FT  nodes  (backup  nodes  for  active 
nodes).  These  FT  nodes  provide  fault  tolerance  for  the  existing  active  nodes.  They  need 
not  be  active  unless  the  active  nodes  that  they  are  supporting  fail.  They  can  run  in  a  power¬ 
saving  mode  and  periodically  query  whether  the  active  nodes  are  still  alive  using  very  lim¬ 
ited  bandwidth. 

Note  that  simultaneous  failures  of  nodes  in  Sa  and  S,  may  result  in  loss  in  sensing  cov¬ 
erage  or  breakage  in  communication  connectivity  since  FT  nodes  are  not  backed  up  by 
nodes  in  St.  However,  if  only  FT  nodes  fail  or  FT  nodes  and  their  non-neighboring  active 
nodes  fail,  the  sensing  coverage  and  communication  connectivity  are  still  guaranteed.  Fur¬ 
thermore,  the  proposed  distributed  algorithm  can  be  applied  in  a  repeated  manner  to  select 
more  FT  nodes  for  the  previously  selected  FT  nodes. 

We  assume  that  the  number  of  nodes  initially  deployed  in  the  sensor  field  is  sufficient 
to  achieve  fault-tolerant  operations,  i.e.,  we  have  enough  sleeping  nodes  available  to  select 
as  FT  nodes.  Some  observations  and  additional  definitions  are  listed  below: 

•  It  is  trivial  to  see  that  if  all  failing  nodes  are  sleeping  nodes,  the  existing  active 

nodes  can  tolerate  the  failure  of  up  to  |  |  nodes. 

•  We  define  the  maximum  number  of  active  nodes  that  can  fail  simultaneously  with¬ 
out  losing  sensing  coverage  or  communication  connectivity  as  the  degree  of  fault 
tolerance  ( DOFT ).  denoted  by  £2  (£2  >  1). 

•  The  nodes  that  are  selected  from  the  set  of  sleeping  nodes  to  obtain  a  £2-DOFT 
wireless  sensor  network  are  referred  to  as  £2-fault-tolerant  (£2-FT)  nodes.  We 
denote  the  set  of  £2 -FT  nodes  as  Sp  . 

•  Let  S f°  =  <f>  and  S p  =  S p  u  Sa  .  It  follows  that  S p  provides  a  solution  to  the 
£2j-DOFT  FTNS  problem.  In  other  words,  a  £2-DOFT  FTNS-derived  sensor  net¬ 
work  is  still  connected  and  provides  undiminished  coverage  of  the  surveillance  area 
if  any  £2  active  nodes  fail. 


4.  Connectivity-Oriented  Fault  Tolerance 

In  this  section,  we  focus  on  the  analysis  of  fault  tolerance  for  communication  connectivity. 
The  discussion  of  fault  tolerance  for  sensing  coverage  is  presented  in  next  section. 

4.1.  An  Upper  Bound  on  the  Number  of  Fault  Tolerance  Nodes 

We  first  consider  the  case  of  1-DOFT,  i.e.,  £2=1.  Let  Nk  be  the  set  of  neighbors  for  sk, 
be  the  set  of  active  neighbors,  and  N‘l  be  the  set  of  sleeping  neighbors.  Let  Ak  be  the  num¬ 
ber  of  neighboring  nodes  for  sk,  Aak  be  the  number  of  active  neighboring  nodes  for  sh  and 
Nk  be  the  number  of  sleeping  neighboring  nodes  for  sk.  In  other  words, 
Ak  =\  Nk  I,  Ak  =1  Nk  I  and  Ask  =1  N'l  I.  It  is  trivial  to  see  that  \/sk  e  S.  Ak  >  1  otherwise  S  is 
not  connected.  Thus  communication  connectivity  is  not  affected  if  any  node  in  Ss  fails. 
This  is  also  true  if  multiple  nodes  in  Ss  fail.  Therefore,  any  number  of  sleeping  nodes  in  Ss 
can  fail  either  sequentially  or  simultaneously.  This  implies  that  only  active  nodes  need  to 
be  considered  as  failing  nodes  for  the  analysis  of  connectivity  fault  tolerance. 

It  can  be  seen  that,  if  3sk  e  S  such  that  Ak  =  1,  then  £2-DOFT  (£2  >  1)  cannot  be 
achieved  for  the  network  since  when  this  neighbor  node  of  sk  fails,  sk  is  disconnected  from 
the  rest  of  the  network  [32],  For  any  wireless  sensor  network  with  Sa  ( Sa  ^  (|)),  \/sk  e  S.  sk 
is  connected  to  at  least  one  node  in  Sa,  i.e.,  Aak  >  1-  Therefore,  Ak  >  Ak  >  1.  In  a  sensor 
network  with  Sa  as  a  backbone  for  both  sensing  and  communication,  if  sk  <£  Sa,  i.e.,  sk  is  a 
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sleeping  node,  we  can  expect  Ak  >  1  due  to  the  need  for  sensing  coverage;  otherwise  an 
active  node  must  be  located  expect  at  the  same  location  as  sk.  This  observation  leads  to  a 
lower  bound  on  the  node  density  required  in  the  sensor  field  for  fault  tolerance.  This  lower 
bound  can  be  used  as  a  necessary  condition  for  the  fault-tolerant  sensor  node  deployment. 

Consider  a  total  of  n  nodes  with  communication  radius  as  r .  each  in  a  sensor  field 
with  area  A.  In  order  to  achieve  fi-DOFT  (Q  >  1),  a  lower  bound  on  the  total  number  of 

.  3A 

nodes  n  in  the  sensor  field  is  given  by:  rl  -  — j .  The  proof,  which  can  be  found  in  [32],  is 

*rc 

straightforward  and  is  therefore  omitted.  For  example,  consider  the  extreme  case  of 
A  =  7r rf  .  For  this  case,  we  must  have  n  >  3.  This  is  obviously  true  since  if  there  are  only 
two  nodes,  neither  of  them  can  fail.  In  the  following  discussion,  we  assume  that  the  initial 
sensor  deployment  has  provided  a  sufficient  number  of  nodes  for  fault  tolerance.  Our  goal 
is  to  designate  extra  sleeping  nodes  as  back-up  nodes,  i.e.,  FT  nodes,  to  provide  fault  toler¬ 
ance  when  currently-selected  active  nodes  fail.  We  also  need  to  minimize  the  number  of 
FT  nodes.  Before  we  present  bounds  on  the  number  of  FT  nodes  needed  to  achieve 
Q-DOFT,  we  prove  the  following  theorem. 

Theorem  1.  Let  sk  G  S  be  a  node  in  the  sensor  network.  Let  the  region  that  lies  within  the 
communication  range  rc  of  sk  be  Ak*  and  let  S*  be  the  set  of  nodes  within  Ak*.  Assume  that 
all  nodes  in  S*  are  connected  to  each  other,  i.e.,  Vsk  Sj  e  S,  there  exists  a  routing  path 
from  st  to  Sj.  In  order  to  ensure  communication  connectivity  between  the  nodes  in  S*  if  sk 
fails,  it  is  sufficient  to  have  10  nodes  ( not  counting  sk)  in  Ak*. 

Proof.  Let  G(V,  E )  be  the  connected  graph  representing  S*,  i.e.,  |  V  \  =  \  S*  \ ,  vk  is  the 
vertex  representing  sk  e  S,  and  Vn,  v  e  V,  (u,  v)  e  E  if  d(u,  v)  <  rc.  Let  GC(VC,  Ec )  be  a  sub¬ 
graph  corresponding  to  a  connected-dominating-set  (CDS)  of  G  [3,  4,  8,  25,  28].  We  first 
derive  an  upper  bound  on  the  number  of  vertices  needed  for  a  CDS.  The  circular  area  Ak* 
with  radius  rc  can  be  divided  into  six  sectors,  denoted  by  A  h  .  . .,  A(l  in  Fig.  2(a).  Each  sec- 

tor  Aj  (1  <  i  <  6)  has  an  opening  angle  of  — .  From  Fig.  2(a),  the  nodes  in  S*  can  be  located 

in  one  or  multiple  sectors,  corresponding  to  the  vertices  in  V  in  these  sectors.  Excluding 
equivalent  cases  due  to  symmetry,  we  list  all  possibilities  for  the  locations  of  the  vertices 
in  Fig.  2(b). 

Case  1:  All  vertices  are  located  in  the  same  sector.  Assume  this  sector  is  Aj  as  shown  in 
Fig.  2(b)[a].  Obviously,  for  any  two  vertices  Vn,  v  £  V  within  A,,  d(u,  v)  <  rc ,  which 
includes  the  case  where  u  and  v  can  be  located  at  the  sector  boundaries.  We  can  sim¬ 
ply  let  Vc=  { u }  where  u  is  an  arbitrary  chosen  vertex.  Therefore,  |  Vc  |  =  1 .  For  exam¬ 
ple,  if  an  active  node  sk  has  only  two  neighbors  in  one  sector,  where  there  are  a  total  3 
nodes  within  the  communication  region  of  sk.  Fault  tolerance  can  be  achieved  for  the 
failure  of  sk  because  one  of  its  two  neighbors  can  be  designated  as  a  FT  node. 

Case  2:  All  vertices  are  located  within  two  sectors.  There  are  three  possibilities  for 
the  sectors  Ak  and  A2,  as  shown  in  Fig.  2(b)[b] ,  2(b) [c],  and  2(b) [d] ,  respectively. 
Since  G  is  connected,  3(u,  v)  e  E  such  that  u  is  in  Al  and  v  is  in  A2.  Moreover,  Vn; 
e  Aj,  3(u,  Uj)  g  E  and  Vtq  e  A2,  3(u,  Uj)  e  E.  Therefore,  Vc=  {u,  v]  is  a  CDS  of  G 
and  |  Vc  |  <2. 

Case  3:  All  vertices  are  located  within  three  sectors.  There  are  four  possibilities  for  the 
sectors  A,,  A2,  and  A3,  as  shown  in  Fig.  2(b) [e],  2(b) [f],  2(b)  [g],  and  2(b)[h],  respec¬ 
tively.  Let  u  be  an  arbitrarily-chosen  vertex  in  Av  Since  G  is  connected,  3(u,  v)  e  E 
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O  Connected  Dominating  Set  (CDS)  Nodes  ONone  (CDS)  Nodes 


(c) 

FIGURE  2  Illustration  of  the  proof  of  Theorem  1:  (a)  A  node’s  communication  region 

7 r 

can  be  divided  into  six  sectors  with  an  opening  angle  of  -j  .  (b)  Proof  of  Theorem  1 :  All 
possibilities  of  vertices  locations,  (c)  Illustration  of  the  six  cases  corresponding  to 
Theorem  1. 


such  that  v  is  either  in  A2  or  in  A3.  Without  loss  of  generality,  assume  that  v  is  in  A2. 
Similarly,  for  w  e  A3,  3 (w,  x )  e  E  such  that  x  is  either  in  A  t  or  in  A2.  Therefore,  Vc  = 
[u,  v,  w,  x}  is  a  CDS  of  G  and  V (.  |  <4. 

Case  4:  All  vertices  are  located  within  four  sectors.  There  are  three  possibilities  for  the 
sectors  A v  A2,  A3,  and  A4  as  shown  in  Fig.  2(b)[T],  2(b)Q],  and  2(b)[k],  respectively. 
Divide  these  four  sector  areas  into  two  groups  where  one  group  has  three  sectors  and 
the  other  group  has  one  sector.  Assume  that  A,,  A2  and  A3  are  in  one  group  the  other 
group  contains  A4.  From  Case  2,  3 (u,  v)  e  E,  where  u  is  in  A4  and  v  is  in  Av  or  At  or 
A3.  Furthermore,  from  the  proof  for  Case  3,  3VX  =  {wj,  vv2,  w3,  w4},  where  Vl  is  a 
CDS  for  vertices  in  Aj,  A2  and  A3.  Therefore,  Vc  =  { u,  v,  wv  w->,  w2,  w4}  is  a  CDS  of  G 
and  |  Vc  |  <6. 

Case  5:  All  vertices  are  located  within  five  sectors.  There  is  only  one  possibility  for  the 
sectors  A  3,  A0,  A3,  A4,  and  A5  as  shown  in  Fig.  2(b)  [T].  Similar  to  the  proof  for  Case  4, 
we  divide  these  five  sector  areas  into  two  groups  where  one  group  contains  any  four 
of  these  five  sectors  and  the  other  group  contains  the  remaining  sector.  Assume  that 
A  |,  A2,  A3,  A4  are  in  the  one  group  and  A5  is  in  the  other  group.  From  Case  2,  3 (u,  v)  e 
E,  where  u  is  in  A5  and  v  is  in  Aj,  or  A2,  or  A3,  or  A4.  Furthermore,  from  the  proof  for 
Case  4,  3F,  =  {W[,  w1,  w0,  w3,  w4,  w5,  w6},  where  V3  is  a  CDS  for  vertices  in  Aj,  A2,  A3 
and  A4.  Therefore,  Vc  =  {u,  v,  wu  w2,  w2,  w3,  w4,  w5,  w6}  is  a  CDS  of  G  and  |  Vc  I  <8. 

Case  6:  Vertices  are  located  in  all  six  sectors.  There  is  only  one  possibility  for  the  sectors 
Aj,  A2,  A3,  A4,  As,  and  A6  as  shown  in  Fig.  2(b) [m].  Similar  to  Case  3  and  Case  4,  we 
divide  these  six  sectors  into  two  group  where  one  group  contains  five  sector  areas  and 
the  other  group  contains  one  sector  area.  Assume  that  A  j,  A2,  A3,  A4,  A5  are  in  the  one 
group  and  A6  is  in  the  other  group.  From  Case  2,  3 {u,  v')  e  E,  where  u  is  in  A6  and  v  is 
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in  Au  or  A2  or  A3  or  A4  or  A5.  Furthermore,  from  Case  5,  3Vl  =  [Hq,  w2,  w3,  vv4,  vv5,  w6, 
w7,  w8},  where  Vl  is  aCDS  for  vertices  in  Al,A2,A3,  A4  and  A5.  Therefore,  Vc=  {u,  v, 
w j,  w2,  w2,  w3,  w4,  w5,  vv6,  w7,  w8}  is  a  CDS  of  G  and  |  Vc  \  <  10. 

The  nodes  corresponding  to  V c  thus  keep  all  nodes  in  S*  connected  even  when  sk  fails. 
Therefore,  the  maximum  number  of  required  FT  nodes  for  sk  is  10. 

Figure  2(c)  illustrates  each  of  the  six  cases  discussed  in  Theorem  1.  Based  on  Theo¬ 
rem  1,  we  can  derive  an  upper  bound  on  the  number  of  FT  nodes  needed  within  the  com¬ 
munication  region  of  an  arbitrarily-chosen  node.  Assume  that  Nk  is  the  set  of  neighbor  for 
sk  g  S.  Consider  the  special  case  where  S  =  Nk  u{s7.},  i.e.,  all  nodes  in  SVfsj.}  are  neighbors 
of  sk.  Suppose  the  nodes  in  Nk  are  not  connected.  When  sk  fails,  3,y(,  ,y;  e  Nk  such  that  no 
routing  can  be  formed  between  and  ,y;-.  Thus  fault  tolerance  can  only  be  achieved  if  there 
is  sufficient  node  density  in  the  network.  Let  rt  be  the  number  of  FT  nodes  required  for  an 
arbitrarily-chosen  node  sk  in  a  1-DOFT  sensor  network.  Next  we  present  a  sufficient  con¬ 
dition  relating  fault  tolerance  with  Fk  in  the  following  theorem. 

Theorem  2.  The  network  is  1-DOFT  with  respect  to  the  failure  of  any  node  sk  G  Sa  if  Vsk 
g  Sa,  the  nodes  in  Nk  are  connected  and  Tk>  10. 

The  proof  of  Theorem  2  is  given  in  the  appendix.  Note  that  we  need  to  have  TA  =  10  only 
when  sk  has  no  active  neighbors,  i.e.,  =  0  ■  This  is  shown  as  Case  6  in  Fig.  2(c).  Since 

Sa  as  a  backbone  is  a  non-empty  set  that  connects  all  nodes,  rA.  =  10  needs  to  be  10  only 
if  |  Sa  |  =1  and  Sa  =  {,y7.}.  This  means  that  all  nodes  are  deployed  within  sk’s  communica¬ 
tion  region  and  only  sk  is  active.  Generally,  we  have  \/sk  e  Sa,\  Aak  l>  1  since  |  Sa  \  >  1, 
which  implies  the  following  corollary  [32],  which  is  proven  in  the  appendix. 

Corollary  1.  When  the  number  of  active  nodes  is  greater  than  one,  i.e.,  \  Sa  \  >  1,  the  sen¬ 
sor  network  is  1-DOFT  with  respect  to  the  failure  of  any  node  sk  g  Stl  if^/sk  g  Sa,  Nk  is 
connected  and  there  are  9  or  more  FT  neighboring  nodes  for  sk. 

Corollary  1  shown  that  Ak  is  a  measure  of  the  communication  connectivity  support 
provided  by  the  active  neighbors  of  sk  when  sk  fails.  In  fact  Ak  >  0  implies  that  there 
exists  built-in  fault  tolerance  for  sk.  The  fault  tolerance  provided  by  the  active  neighbors  in 
Nk  decreases  the  maximum  number  of  FT  nodes  needed  when  sk  fails.  Note  that  the  above 
is  true  only  for  Q.  =  1  since  when  Q  >  1,  nodes  in  Nk  may  also  fail  at  the  same  time  when 
sk  fails.  Both  Theorem  2  and  Corollary  1  assume  that  when  sk  g  Sa  fails,  the  selected  FT 
nodes  for  sk  do  not  fail.  Since  FT  nodes  are  selected  to  provide  fault  tolerance  for  active 
nodes  in  Sa,  their  own  failures  are  not  considered  in  the  analysis.  However,  the  same  pro¬ 
cedure  of  selecting  FT  nodes  for  active  nodes  in  Sa  can  be  applied  repeatedly  to  select 
more  FT  nodes  in  a  sequential  manner. 

Our  goal  in  this  paper  is  to  develop  is  to  develop  a  distributed  self-organization  algo¬ 
rithm,  where  nodes  rely  only  on  single-hop  or  restricted-hop  knowledge.  Therefore,  we 
allow  each  active  node  sk  g  Sa  to  select  FT  nodes  only  from  its  sleeping  neighbors.  Recall 
that  we  denote  the  set  of  FT  nodes  in  a  £2-DOFT  sensor  network  FT  nodes  as  Sf1.  Let  A? 
be  the  set  of  FT  neighbors  for  an  arbitrarily-chosen  sk  g  Sa  in  a  Q-DOFT  network. 
Obviously,  A'p  cz  Nsk  and  =  I  A'p 1-  When  each  active  node  finds  its  corresponding  A?> 
the  set  Sn  is  determined,  i.e.,  =  |  L  ATp ,  where  the  total  number  of  FT  nodes  in 

this  £2-DOFT  sensor  network  is  |  Sf  I  •  Next,  we  derive  an  upper  bound  on  the  total  number 
of  FT  nodes  needed  for  the  entire  sensor  network.  Consider  a  wireless  sensor  network  con¬ 
sisting  of  n  nodes  each  with  communication  radius  rc.  Let  the  set  of  nodes  be  denoted  by  S. 
Assume  that  all  nodes  in  S  are  connected,  i.e.,  Vs(,  s-  g  S,  there  exists  a  routing  path 
from  Sj  to  Sj.  Let  G(V,E)  be  the  connected  graph  corresponding  to  S,  i.e.,  V  =  S  and 
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vk  be  the  vertex  representing  sk  e  S ,  where  \/u,v  e  V,  ( u,v )  e  E  if  d(u,v)  <  rc.  Assume  that 
Sa  is  the  set  of  (active)  backbone  nodes.  The  subgraph  corresponding  to  Sa  is  denoted  by 
Ga(Va,  Ea),  where  Ga  is  a  CDS  of  G.  Let  S1  be  the  set  of  nodes  selected  as  FT  nodes  to 
achieve  1-DOFT.  For  1-DOFT  case,  this  bound  is  obtained  directly  from  Theorem  3.  The 
proof  is  given  in  the  appendix. 

Theorem  3.  An  upper  bound  on  the  total  number  of  FT  nodes  needed  to  achieve  1-DOFT 
is  given  by: 


i  s;  i< 


10, 

9IVJ-I  Eal, 


if '  va  i=  i; 

iflFJ>l. 


(3) 


Next,  we  consider  a  more  general  fault  tolerance  scenario  where  £2  >  1.  Note  that  we 
assume  |  Sa  |  >  LI  for  the  analysis  of  £2-DOFT;  otherwise  £2-DOFT  is  not  meaningful.  In 
the  following,  we  determine  the  number  of  nodes  r/(  needed  for  an  arbitrarily-chosen 
active  node  to  achieve  £2-DOFT  in  its  communication  region.  In  the  following,  we  assume 
that  Ak  >  £2  - 1  to  simplify  the  discussion.  Note  also  that  since  £2  >  1 ,  we  have  |  Sa  |  >  1 . 
Therefore,  we  can  ignore  the  special  case  where  only  one  node  is  active  and  all  other 
nodes  are  placed  within  its  communication  range. 


Theorem  4.  The  network  is  Q.-DOFT  (£2  >  1)  with  respect  to  failures  of  any  £2  nodes 
inside  the  communication  region  of  an  arbitrarily-chosen  sk  eSa(l<f2<Aj+l),  if  the 
nodes  hi  Nk  are  connected  and  >  £2.  Moreover,  is  lower-bounded  by  the  following: 


£2  +  9, 

if  sk  fails  and  £2  =  A^  + 1; 

£2  +  8, 

if  sk  fails  and  £2  <  Ak  + 1; 

(4) 

Q, 

if  sk  does  not  fail. 

The  proof  of  Theorem  4  is  given  in  the  appendix.  We  now  present  bounds  on  the  total 
number  of  FT  nodes  needed  to  achieve  £2-DOFT  (£2  >  1  and  Sa  |  >  £2  >  1).  Note  that  for 
a  £2-DOFT  sensor  network,  if  3s k  e  Sa  such  that  £2  >  A)!,  the  DOFT  in  the  communica¬ 
tion  region  of  sk  is  at  most  Ak  + 1.  In  this  case,  since  the  maximum  number  of  failing  nodes 
within  the  communication  region  of  sk  is  at  most  Ak  + 1,  £2-DOFT  for  sk  refers  to  the  fail¬ 
ure  of  up  to  &'k  + I  nodes  inside  the  communication  region  of  sk,  and  the  failure  of 
£2  —  (A'k  + 1)  nodes  outside  the  communication  region  of  sk.  Thus,  when  £2-DOFT  is 
achieved  for  the  entire  sensor  network,  fault  tolerance  with  the  maximum  number  of  fail¬ 
ing  nodes  in  the  communication  region  of  sk  is  automatically  achieved.  Let  .S’,  <z  Sa  be  the 
set  that  contains  £2  failing  active  nodes,  where  the  subgraph  representing  .S’,  is  denoted  by 
Gj(  V/Mj).  Let  sf1  be  the  set  of  nodes  selected  as  FT  nodes  to  achieve  £2-DOFT  in  the  sen¬ 
sor  network.  This  bound  is  given  by  Theorem  5.  The  proof  is  given  in  the  appendix. 

Theorem  5.  An  upper  bound  on  the  total  number  of  FT  nodes  needed  to  achieve  Q.-DOFT 
is  given  as 


10IVJ-4I  Ea\, 

if  Gj  is  connected; 

9Wa\, 

if  Gj  is  not  connected  and  Ej  - 4>; 

(5) 

9IVJ-2, 

if  G}  is  not  connected  and  Ej-  /  c(). 
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4.2.  Lower  Bound  on  the  Number  of  Fault  Tolerance  Nodes 

To  reduce  energy  consumption,  it  is  desirable  to  minimize  the  number  of  FT  nodes  needed,  i.e.,  to 
minimize  the  size  of  sf1-  In  this  section,  we  present  a  lower  bound  on  the  number  of  FT  nodes 
needed  to  achieve  the  required  £2-DOFT  (£2  >  1)  in  wireless  sensor  networks.  Let  n{  <z  N%  be 
the  set  of  failing  active  neighbors  of  sh  i.e.,  Sy  =  (Js  Nk  .  Let  N®  c  Nsk  be  the  set  of  FT 
nodes  for  sh  i.e.,  Sf  =  (J  s  eS  N f  . 

We  know  from  previous  subsections  that  \/sk  e  Sa,  FT  nodes  of  sk  keep  all  neighbors 
nodes  of  sk  in  Nk  connected.  This  implies  that  the  subgraph  representing  N?  is  a  CDS  of 
the  subgraph  representing  Nk.  When  £2  =  1,  the  minimization  of  |  Sp  I  is  equivalent  to 
finding  the  MCDS  for  the  subgraph  representing  Nk  for  each  active  node  sk  e  Sa.  How¬ 
ever,  since  no  failing  active  node  has  any  failing  active  neighbors  for  £2=1,  such  an 
MCDS  for  sk  also  contains  existing  active  neighbors  in  Nk  as  existing  dominating  nodes. 
Let  sj  be  the  set  of  nodes  selected  as  FT  nodes  to  achieve  1-DOFT  in  the  sensor  network.  It 
is  then  easy  to  see  that  a  lower  bound  on  the  total  number  of  FT  nodes  needed  to  achieve  1- 
DOFT,  i.e.,  |  51  |,  is  given  by: 


I S,1  I  > 


if  I  Sa  1=  1; 
if  I  Sa  I>1. 


(6) 


Note  that  the  best  case  of  |  S']  1=  0  when  |  Sa  \  >1  rarely  happens  in  practice,  because 
it  requires  that  neighbors  of  any  active  node  are  also  neighbors  of  at  least  another  active 
node.  This  implies  that  all  nodes  are  within  a  circle  of  radius  xr.  Since  Sa  |  >1,  this 
makes  the  other  |  Sa  |  -  1  nodes  unnecessary.  It  is  possible  to  have  several  such  nodes  but 
if  |  Sa  |  is  very  large,  there  will  be  a  significant  energy  overhead  for  these  nodes.  When 
£2  >  1,  the  analysis  is  more  complicated  because  when  an  active  node  sk  fails,  some  active 
neighbors  in  Nk  may  also  fail  at  the  same  time. 

To  simplify  the  discussion,  we  define  function  M.  as  follows:  Sa  =  A4(S.  Sa )  -  where 

1  ■  Sa  £  Sa ; 

2.  The  subgraph  representing  Sa  is  a  connected  dominating  set  (CDS)  of  the  graph 
representing  S', 

3.  For  all  possible  sets  that  satisfies  1)  and  2),  Sa  has  the  smallest  size.  We  refer 
to  determining  Sa  as  a  constrained  minimum  connected  dominating  set  (con¬ 
strained  MCDS)  problem.  Note  that  if  Sa  =  f,  then  Sa  is  the  MCDS  of  S.  To 
achieve  £2-DOFT  (£2  >  1)  in  the  wireless  sensor  network,  we  need  to  find  the 
set  of  FT  nodes  Sp  such  that  S?  =\Jys  i<slM(S\Sf,Sa\Sf).  Let 
N[  ciV[  be  the  set  of  failing  active  neighbors  of  fsk.  We  can  obtain  a  lower 
bound  on  the  number  of  FT  nodes  needed  to  achieve  £2-DOFT  (£2  >  1)  as 
follows  [32]: 


lsfnl  =  l  U  U  n?\>\  u 

\/\Sf\=n.Sf^SasteSf  VI5/l<n,S/cS„ 


u  M(Nk\Nfk,Nak\Nfk) 


y\/skeSf 


(7) 


>  I  S,Q  I  >  I  U  M(S\Sf,Sa\Sf)  I 


VI5fl<n,5f=S, 


Note  that  if  £2  =  |Sfl| ,  Sa\Sf=  0,  then  I  Sf  I  >  I M  (S\ 5a,(|))  I  • 
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4.3.  Connectivity-oriented  Selection  of  Fault  Tolerance  Nodes 


Since  the  CDS  and  MCDS  problems  are  A/P-complete  [3,4,8,25,28],  finding  the  con¬ 
strained  MCDS  to  achieve  Q-DOFT  as  shown  in  Equation  (7)  is  also  A/iP-complete. 
When  only  single-hop  knowledge  is  available,  for  any  sk  e  Sa,  there  are  a  total  of 


Q. 

2-ti=) 


f\  Nt  lA 


of 


possible 

( 


possible  combinations  of  failing  nodes  for  sk,  as  a  result,  the  total  number 
of  failing  nodes  for  all  the  active  nodes  is 


^-1 

^i= 1 


combination 
Ah 

.  Each  evaluation  requires  the  finding  of  the  MCDS  for  neigh- 


1^1 


bors  of  the  failing  node.  Even  though  failing  active  nodes  may  share  many  neighbors,  a 
through  evaluation  in  this  way  is  still  computationally  very  expensive. 

For  a  wireless  sensor  network  with  a  set  Sa  of  active  nodes  serving  as  a  backbone,  the 
maximum  number  of  nodes  that  can  fail  is  |  Sa  \ .  We  propose  the  following  distributed 
procedure  to  achieve  fault  tolerance  for  the  simultaneous  failure  of  up  to  |  Sa  \ .  The  pro¬ 
posed  distributed  procedure  is  based  on  the  algorithm  from  [28].  Note  that  other  heuris¬ 
tics,  such  as  the  algorithms  described  in  [3,25],  can  also  be  used  as  the  base  for  building 
our  distributed  procedure,  since  the  proposed  fault  tolerance  procedure  is  a  stand-alone 
module  operating  on  the  existing  subset  of  backbone  nodes.  The  procedure  contains  three 
steps  as  shown  in  Fig.  3. 

In  Step  1  of  Fig.  3,  each  active  node  selects  a  FT  node  for  any  of  its  disconnected 
active  neighbors.  We  refer  to  this  type  of  FT  nodes  as  gateway  FT  nodes  since  they  pro¬ 
vide  alternative  routing  paths  for  active  neighbors  of  the  failing  node.  When  that  poten¬ 
tial  failing  node  actually  fails,  the  network  traffic  from  the  failing  node  to  its  active 
neighbors  can  still  be  delivered.  Though  the  first  type  of  FT  nodes  are  able  to  take  care 
of  the  routing  data  originating  from  failing  active  nodes,  they  are  not  necessarily  con¬ 
nected  among  each  other  and  are  not  necessarily  connected  to  sleeping  neighbors  of  the 
failing  active  node.  Step  2  in  Fig.  3  deals  with  this  problem  by  using  a  modified  version 
of  the  algorithm  proposed  in  [28],  which  proposed  a  distributed  approach  for  construct¬ 
ing  the  CDS  for  a  connected  but  not  a  completely  connected  graph.  In  the  worst  case, 
when  all  nodes  in  Sa  fail  at  the  same  time,  the  subgraph  representing  the  FT  nodes 
should  be  a  CDS  of  the  subgraph  representing  Ss.  We  can  therefore  utilize  the  algorithm 
proposed  in  [28]  with  the  target  graph  representing  Ss.  Note  that  in  Step  2,  we  have 


Distributed  FT  nodes  selection  procedure 


/*  Potential  failing  node  sk  selects  gateway  FT  node  */ 

Step  1.  Vsjt  G  Sa,  for  each  pair  of  active  neighbors  that  are  not  directly  connected,  sk  selects  s,  G  Nk  as  FT 
node  if  .s,  connects  both  of  them. 

/*  Ensure  that  FT  nodes  are  connected  */ 

Step  2.  Vsjt  G  Ss, 

Step  2.1.  if  sk  has  two  disconnected  FT  neighbors,  then  sk  assigns  itself  as  FT  node; 

Step  2.2.  if  sk  has  two  disconnected  FT  neighbor  node  and  sleeping  node,  then  sk  assigns  itself  as  FT  node; 
/*  Each  node  must  have  at  least  one  FT  neighbor  node.  */ 

Step  3.  Vs/i  G  Sa,  if  sk  has  no  FT  neighbors,  then  sk  assigns  itself  as  FT  node. 


FIGURE  3  Distributed  fault  tolerance  nodes  selection  procedure. 
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already  found  gateway  FT  nodes,  therefore  Step  2  needs  only  check  for  connectivity  of 
disconnected  FT  nodes.  To  ensure  that  the  proposed  distributed  procedure  is  also  appli¬ 
cable  to  more  general  scenarios.  Step  3  is  added  to  handle  the  case  that  the  subgraph 
representing  Ss  is  a  completely  connected  graph.  Let  A  be  the  maximum  node  degree.  In 
Fig.  3,  Step  1  takes  0(A3)  time.  Step  2  takes  0(A2)  time,  and  Step  3  takes  0(A)  time. 
Therefore,  the  proposed  procedure  takes  G(A3)  time.  We  next  prove  that  the  proposed 
distributed  procedure  achieves  |  Sa  |  -DOFT  for  a  wireless  sensor  network  with  the  set  of 
active  nodes  given  by  Sa. 

Theorem  6.  Assume  that  all  nodes  in  S  are  connected,  i.e..  Vs,,  £■  e  S,  there  exists  a  rout¬ 
ing  path  from  s,-  to  Sj.  Assume  that  Sa  is  the  set  of  active  nodes  as  a  backbone  that  keeps  all 
nodes  connected.  Assume  that  S,  is  the  set  of  FT  nodes  obtained  from  the  distributed  FT 
selection  procedure  given  by  Fig.  3.  The  set  Sf  achieves  Q.-DOFT  in  this  wireless  sensor 
network,  where  £2  =  |  Sa  \ . 

Proof.  Since  the  maximum  number  of  nodes  that  can  fail  is  |  Sa  | ,  we  only  need  to 
consider  the  case  that  the  selected  FT  nodes  in  St  are  able  to  keep  the  network  fully 
connected  when  all  nodes  in  Sa  fail.  Let  GS(VS,  Ej  be  the  subgraph  representing  Ss  =  .SYS), 
and  G,(Vf ,  Et)  be  the  subgraph  representing  Sr  To  prove  that  G,  is  a  CDS  of  Gs,  we  first 
show  that  G,  is  connected,  then  we  show  that  for  any  v  e  Vs,  v  is  either  in  V,  or  adjacent  to 
a  vertex  in  Vt. 

Consider  any  u,  v  e  V,.  Since  Gs  is  connected,  3 P(u,  v)  as  the  shortest  path  from  u  to  v 
in  Gs,  where  P(u,  v)  <z  Vs  is  the  set  of  the  vertices  in  the  path.  If  |  P(u,  v)  |  =  2,  the  theorem 
is  trivially  proved.  Assume  \p(u,  v )  |  >  3,  and  let  P(u,  v )  =  {u,  u h  u2,  ■  .  .,  v}.  Consider 
predecessor  vertices  of  u  in  P(u,  v),  i.e.,  m,.  Since  u  e  Vr  from  Step  2  in  Fig.  3,  u]  has  to  be 
in  V,,  irrespective  of  whether  u2  is  in  V,.  The  same  argument  holds  for  u2.  Doing  this 
repeatedly,  we  have  Vw  e  P(u,  v ),  w  e  Vt,  i.e.,  P(u,  v)  e  Vr  Next,  Vv  6  Vs,  from  Step  3  in 
Fig.  3,  v  has  at  least  one  FT  neighbor.  Therefore,  G,  is  a  CDS  of  Gs. 

5.  Coverage- Centric  Fault  Tolerance 

In  Section  4,  we  have  discussed  the  £2-DOFT  problem  for  fault-tolerant  communication 
connectivity  of  up  to  £2  active  nodes  failing  simultaneously  (£2  >  1).  However,  we  should 
also  take  fault  tolerance  for  sensing  coverage  into  account  to  achieve  the  surveillance  goal 
over  the  field  of  interest.  This  implies  that  the  nodes  selected  as  FT  nodes  must  be  able  to 
provide  enough  sensing  coverage  over  the  areas  that  were  originally  under  the  surveil¬ 
lance  of  the  £2  failing  active  nodes. 

5.7.  Loss  of  Sensing  Coverage  from  Failing  Nodes 

Recall  the  collective  coverage  probability  for  a  grid  point  g,  defined  in  Section  3.  Since 
only  the  active  nodes  in  Sa  perform  communication  and  sensing  tasks,  the  collective  cov¬ 
erage  probability  for  g,  is  actually  from  nodes  in  Sj1 ,  where  S'1  <z  .S',  is  the  set  of  active 
nodes  that  can  detect  g,.  When  the  nodes  fail  in  the  network,  the  set  of  active  nodes  that 
can  detect  g,,  i.e.,  S" ,  changes  with  time,  which  subsequently  changes  the  sensing  cover¬ 
age  over  that  grid  point.  Let  qfS)  be  a  mapping  from  a  set  S  of  nodes  to  the  coverage 
probability  for  grid  point  g,,  pff)  be  a  mapping  from  a  time  instant  t  to  the  coverage  prob¬ 
ability  for  grid  point  g,,  and  S(t )  be  a  mapping  from  a  time  instant  t  to  a  set  of  nodes.  Then 
Sft)  is  the  set  of  nodes  that  can  detect  grid  point  g,  at  time  instant  t.  For  example,  if  at 
time  instant  t,  only  nodes  in  the  subset  Sf  i.e.,  active  nodes,  detect  grid  point  g;,  there¬ 
fore  Sj  ( t )  =  S“  and  pt  ( t )  =  ( 5)  (t))  =  (Sj1 ).  Therefore,  from  Equation  (2),  the  collective 
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coverage  probability  of  gl  under  the  fault  tolerance  constraint  is  a  function  of  time  given 
as  follows: 


Pi(t)=qi(s?(t))  =  l-  n  a-pf), 


(8) 


where  .S'"  (t)  is  the  set  of  active  nodes  that  can  still  detect  gt  at  time  instant  t.  Therefore,  the 
goal  is  to  ensure  that  the  selected  FT  nodes  and  existing  active  nodes,  i.e.,  ,S(U  =  Sa  u  .Sf\ 
are  able  to  keep  the  sensor  field  adequately  covered  whenever  up  to  Q.  active  nodes  fail. 
Thus,  successful  sensing  coverage  over  the  sensor  field  for  FTNS  in  wireless  sensor  net¬ 
works  is  indicated  by: 


Vg,.  e^,  Pi(t)>p,h,  (9) 

where  pth  is  the  coverage  probability  threshold  defined  in  Section  3.  Theorem  7  shows  the 
relationship  between  the  loss  of  sensing  coverage  and  the  fault-tolerant  operation  in  wire¬ 
less  sensor  networks. 

Theorem  7.  Assume  that  all  nodes  in  S  are  connected,  i.e.,  Vs;,  Sj  £  S,  there  exists  a  rout¬ 
ing  path  from  st  to  Sj.  Let  Q  be  the  set  of  all  the  grid  points  in  the  sensor  field.  Let  Sj  be  the 
set  of  nodes  that  can  detect  the  grid  point  gj  e  Q  initially  after  the  deployment.  Let  Sf  t)  be 
the  set  of  nodes  that  can  delect  g{  at  time  t,  and  sf  ( t )  be  the  set  of  failing  active  nodes  for 
gj  at  time  t.  Throughout  the  operational  life  time  of  a  sensor  network,  Vg-  e  Q  ,  the  follow¬ 
ing  must  be  satisfied  for  any  time  instant  t: 


Pf(t+  I)  < 


Pi(0-Pih 

l~Pih 


(10) 


where  pfit)  -  1- ~  Pi  )  and  Pf(r  +  !)  -  1  Il^ss/q+i/1  P‘  )• 

Proof.  Consider  time  instants  t  and  t  +  1.  Obviously  we  have  Sft)  <z  .S',  and 
Sj (t)  =  Sj{t  +  \)vjS{ (t  +  1).  From  Equation  (8),  we  have 


Pi(t)  =  i-  n  a-p?)= i-  n  a-pf) 

skeSj(t)  skeSj(t+l)vSf  (r+1) 

=  1-  n  a -ph  n  a -pf)- 

skeS, .(/+t)  steSf(t+ 1) 

Similarly,  pfit  +  \)  =  Let  Pf(t)  =  and 

Pf(t + 1) = i  -  n.^/,,+1/1  -  p"  )■ Then  we  have 


Pi  (t)  =  l  ■ -  (l  ■ -  Pi  ( t  +  l))(l  -  Pf  it + 1))  =  P  fit + 1) + Pi  it  +  l)(l  -  Pf  it  + 1)). 
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pi(t)-pf(t  + 1) 

Therefore,  pft  + 1)  = - - - .  From  Equation  (9),  which  expresses  the  FTNS  sens- 

\-Pf(t+\) 

pft)-  pf(t+Y) 

ing  coverage  condition  for  any  time  instant,  we  have  pt  (t  +  l)>  pth  => - >  pth. 


i-pf(t+\) 


which  implies  that  pf(t  + 1)  <  — ^th 
1  1  - 


-Pth 


From  the  proof  of  Theorem  7,  we  see  that  pf  t  +  1)  represents  the  sensing  coverage 
loss  at  time  t  +  1  at  grid  point  g{  caused  by  the  failing  nodes  in  sf  (t  + 1)  •  To  satisfy  the 

coverage  probability  threshold  requirement,  pft  +1)  must  not  exceed  Elfll — Elll .  In  other 

1  P,h 


words,  if  we  can  bound  the  coverage  loss  pft)  below  El— — Elll  during  the  operational 

1~P,h 

lifetime  of  the  sensor  network  for  all  grid  points  on  the  field,  the  sensor  network  is  able  to 
tolerate  up  to  Q.  nodes  failing  simultaneously.  When  pft)  drops,  the  bound  on  the  coverage 
loss  from  failing  nodes  at  the  next  time  instant,  i.e.,  pft  +1),  becomes  tighter  since 

El — — Ei!l  decreases  when  pft)  decreases.  This  can  also  be  used  as  a  warning  criteria  to 
1  ~P,h 

inform  the  base  station  whether  a  current  node  may  lose  sensing  coverage  over  its  sensing 
area. 

Note  that  the  fault  tolerance  problem  for  sensing  coverage  differs  from  the  fault  toler¬ 
ance  problem  for  communication  connectivity  discussed  in  Section  4  since  there  is  no 
direct  relationship  between  the  number  of  failing  nodes  and  the  coverage  loss  pft).  For 
example,  for  gt  with  |  Sft)  \  =  1,  pft)  might  be  the  same  as  pft)  for  g{  where  |  Sft)  |  =  1,  2,  3  or 
even  higher.  This  is  due  to  the  fact  that  for  any  grid  point  gh  pft)  is  not  directly  related  to 
the  number  of  nodes  that  can  detect  gj  but  rather  to  the  distances  from  these  nodes  to  gr  as 
defined  by  Equation  (1). 


5.2.  Distributed  Approach 

We  next  propose  a  coverage-centric  fault  tolerance  algorithm  that  can  be  executed  in  a 
distributed  manner,  and  requires  much  less  computation  than  the  centralized  case.  Without 
loss  of  generality,  assume  rc  >  2 rs,  i.e.,  .S',  <z  Nk.  For  grid  point  g,  e  Ak  corresponding  to 

node  sk  e  Sf  <z  Sa ,  the  maximum  coverage  loss  happens  when  all  nodes  in  Sf  fail.  In  this 
case,  the  coverage  loss  for  gb  denoted  as  (Sf),  is  given  as  qfSf )  =  1-ILrO-A 
Let  Sf1  cz  Sf  be  the  set  of  FT  nodes  for  grid  point  gr  The  coverage  compensation  from 
Sf1.  denoted  as  ql  ( Sf 1 ),  is  given  as  q ,■  (Sf1 )  =  1  -  ]~[  ?  gs£2  (1  -  pf ).  Let  qt  (Sf  U  Sf1 )  be  the 
coverage  from  both  active  nodes  and  the  FT  nodes  for  gt.  Similarly, 
qfSf  u5,q)  =  1-TT  „  a  (1  -  pf).  Assuming  that  the  maximum  coverage  loss  hap- 

pens  at  time  instant  t  +  1,  i.e.,  Sft)  =  Sf  u  Sf1 ,  St  (t  + 1)  =  Sf1,  and  Sf(t  +  l)  =  Sf,  then 
accordingly,  we  have  corresponding  expression  as  pft)  =  qfSf  KjSf1),pft  + 1)  =  qfSf1) 
(1),  and  (t  + 1)  =  qfSf).  From  Equation  (10),  if  the  following  is  satisfied  for  all  grid 
points  in  the  sensing  area  of  sh  i.e.,  Ah  then  the  node  sk  is  able  to  tolerate  the  maximum 
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number  of  failing  active  nodes  within  its  own  sensing  area  without  losing  sensing 
coverage: 


VfteA, 


■Pth 


l-Pth 


(11) 


Equation  fll)  requires  2^ 


j= i 


J  J 


evaluations  for  a  total  of  |  Ak  \  grid  points  within  sk s 
sensing  area.  When  each  active  node  executes  the  evaluation  procedure  described  by 

.  However,  note 


Equation  fll),  the  maximum  total  number  of  evaluations  is  IX 

gt^Sj= l 


ls,?l  sf  i'' 


J  J 


that  qj(S“  U Sf1)  —  qj(Sf)+  q^Sf1)- qi(Sf)qi(Sf1),  therefore,  from  Equation  (11),  we 
have 


d,(S“)< 


q,(sf  vs“)-Pt, 

l~Pth 


■q,(Stl)>pth, 


(12) 


which  corresponds  to  the  analysis  in  Theorem  7.  Equation  (12)  implies  that  we  can  design 
the  fault-tolerance  nodes  selection  for  sensing  coverage  in  a  much  less  computationally 
expensive  way.  Figure  4  shows  the  pseudocode  for  the  coverage-centric  fault  tolerance 
node  selection  algorithm. 

As  shown  in  Fig.  4,  to  select  the  minimum  number  of  FT  nodes  without  a  thorough 
evaluation  over  all  subsets  of  nodes  in  Ss ,  we  first  construct  L,  from  S' ,  where  L,  is  a  list 
corresponding  to  the  set  of  nodes  sf  such  that  L,  is  constructed  as  a  sorted  list  in  the 
descending  order  of  the  individual  coverage  on  grid  point  of  all  nodes  in  Sf  ■  For  any 
sk  e  Sf ,  the  corresponding  element  in  Lt  is  denoted  by  l(sk),  which  gives  the  position  of  sk 
in  the  list  L;.  Therefore,  for  any  two  different  nodes  sk^ ,  skt  e  Sf  ,l(ski)  <  l(skn)  if 


Procedure  DistCovCentricFTNSelection  (s*) 


01  Set  Sf}  =  <f>; 

02  For  Vpi  €  Ak  /*  Check  if  current  FT  nodes  in  Sk  are  adequate  for  fault  tolerance  at  <?j.  *1 
03  If  qi(Sk)  >  pth  Continue;  End  /*  Find  FT  nodes,  i.e.,  S'p,  for  g,.  */ 

04  Construct  the  sorted  list  Li  from  Sf; 

05  For  j  =  \Li\  to  1 

06  If  qi(L,(l,  ■  ■  ■  ,j))  >  pth  Continue;  End 

07  If  j  ==  |L,,|  break;  End  /*  Not  enough  nodes  in  Sf  for  fault-tolerance.  */ 

08  Set  Sf}  =  Li(l,  ■■■  ,j  +  l);  Break; 

09  End 

10  End 


FIGURE  4  Pseudocode  for  the  distributed  coverage-centric  fault  tolerance  nodes 
selection. 
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k  k  || 

Pi 1  >  Pi 2 .  We  denote  the  length  of  the  list  L{  as  |  Lt  | ,  where  |  L,  1=1  S-  I-  We  define  the 

position  l(sk)  as  a  positive  integer,  where  I(sk)  =  1  if  sk  is  the  first  element  in  L,  and 

l(sk)  =  |  L;  |  if  sk  is  the  last  element  in  L{.  We  refer  to  a  subset  containing  a  single  node 

sk  in  Lj  at  the  j-th  position  by  Lfj),  i.e.,  Lfj)  =  {sj /(.s*)  =  j,  1  <  j  <  U,-l  }. 

Furthermore,  we  use  Lfj , ,  j2,.  .  ,JU)  to  denote  the  subset  of  nodes 

{ V  1  l(skl)  =  jvl(\)  =  j2,---J(sK)  =  h  and  ]-h  —  'it  ^  ^  Ju  ~l  Li  '}•  Thus’ 

for  a  given  grid  point  gk  when  there  are  enough  nodes  in  Ss  f°r  Y,  as  FT  nodes.  Fig.  4  is 
able  to  generate  the  subset  of  FT  nodes  from  Sf  with  the  minimum  number  of  FT  nodes 
among  for  gk  Note  however  that  to  avoid  the  repeated  selection  of  the  same  nodes  for  dif¬ 
ferent  grid  points,  before  selecting  the  FT  nodes  for  the  current  grid  point,  the  coverage 
support  from  existing  FT  nodes  in  Sn  is  checked  first  to  see  if  they  already  provide 
enough  coverage  support  when  active  nodes  fail;  see  line  3  in  Fig.  4.  Therefore,  even 
though  the  number  of  FT  nodes  selected  is  locally  minimum  for  a  given  grid  point,  it  is  not 
necessarily  a  global  minimum. 

Note  that  the  evaluation  procedure  is  per  grid  point,  which  can  be  executed  on  either  a 
sleeping  node  or  an  active  node.  For  any  g{  eQ ,  only  one  node  needs  to  perform  the 
selection  of  FT  nodes  for  gr  This  implies  that  the  total  number  of  nodes  required  for  exe¬ 


cuting  such  evaluation  procedure  is 


where  A  is  the  area  of  the  surveil¬ 


lance  field  (assuming  that  either  rc  >  2 rs  or  £_  -hop  knowledge  is  available).  Also  note 


that  in  Fig.  4,  there  is  no  need  to  calculate  qfSa )  every  time  since  it  is  available  from  the 
previous  stage  when  Sa  is  determined.  Further  computation  can  be  reduced  by  temporarily 

storing  the  qfS^)  for  the  current  grid  point  for  evaluation  at  the  next  grid  point,  where 

<7,  (Sf  U  S f  )  can  be  obtained  as:  <?,■  (Sf  U.S'“)  =  q{  (Sf1  l  +  qfS^)-  q-t  (Sf1  )qt  {Sf  ). 


The  sorting  procedure  needed  to  construct  L,  from  Ss  has  a  time  complexity  of  0(A 
log  A),  where  A  is  the  maximum  node  degree.  The  pseudocode  between  line  5  to  line  9  in 
Fig.  4  for  FT  nodes  selection  has  a  time  complexity  of  0(A).  Since  the  distributed  cover- 
age-centric  fault-tolerant  procedure  in  Fig.  4  is  carried  out  per  grid  point,  the  overall  time 
complexity  for  the  distributed  coverage-centric  fault-tolerant  node  selection  has  a  time 
complexity  as  0(mA(  1  +  log  A))  =  0(mA  log  A),  where  m  is  the  number  of  grid  points.  The 
next  theorem  shows  that  the  procedure  of  Fig.  4  leads  to  the  smallest  number  of  FT  nodes 
needed  to  satisfy  the  coverage  threshold  for  a  given  grid  point  gf.  The  proof  is  given  in  the 
appendix. 


Theorem  8.  For  a  grid  point  gk  the  distributed  coverage-centric  fault-tolerance  node 
selection  procedure  given  by  the  pseudocode  in  Fig.  4  gives  the  minimum  number  of  fault- 
tolerance  node. 

As  shown  in  Figs.  3  and  4,  the  proposed  scheme  does  not  require  a  centralized  server 
to  determine  backup  nodes  for  the  existing  backbone.  FT  nodes  are  designated  in  a  distrib¬ 
uted  fashion;  this  procedure  requires  only  localized  communication  (single-hop  or 
restricted  hop  communication  between  nodes).  The  proposed  self-organization  approach 
for  fault  tolerance  is  therefore  scalable,  which  makes  it  suitable  for  ad  hoc  sensor  networks 
with  a  large  number  of  deployed  nodes. 
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6.  Simulation  and  Discussion 

We  have  implemented  CCANS  in  ns2  and  integrated  as  a  module  in  the  ESP  AESOP 
protocol.  The  Emergent  Surveillance  Plexus  (ESP)  [34]  is  a  Multi-disciplinary  Uni¬ 
versity  Research  Initiative  (MURI),  whose  goal  is  to  advance  the  surveillance  capa¬ 
bilities  of  wireless  sensor  networks.  It  involves  participants  from  Pennsylvania 
State  University,  University  of  California  at  Los  Angles,  Duke  University,  Univer¬ 
sity  of  Wisconsin,  Cornell  University,  and  Louisiana  State  University.  AESOP 
stands  for  An  Emergent-Surveillance-Plexus  Self-Organizing  Protocol,  which  is 
designed  for  target  tracking  in  wireless  sensor  networks  with  high  tracking  quality 
and  energy  efficiency  [5],  A  more  detailed  description  of  the  AESOP  protocol  can 
be  found  in  [5], 


6.1.  Simulation  Results 

In  a  simulation  for  the  proposed  fault-tolerant  self-organization  algorithms,  we  first  collect 
the  data  from  the  distributed  CCANS  procedure  described  in  Section  3.  We  next  evaluate 
the  proposed  distributed  FTNS  procedure  using  MatLab  by  feeding  the  data  collected 
from  CCANS  as  inputs.  The  data  from  CCANS  contains  locations  of  sensor  nodes  after 
deployment  and  their  final  state  decisions.  There  are  150,  200,  250,  300,  350,  and  400 
nodes  in  each  random  deployment,  respectively,  on  a  50  x  50  grid  representing  a  50m  x 
50m  sensor  field.  All  nodes  have  the  same  maximum  communication  radius  rc  =  20m  and 
maximum  sensing  range  rs  =  10m.  The  value  of  Q.  is  set  to  the  number  of  active  nodes. 
Figures  5-8  show  the  simulation  results  for  distributed  fault-tolerance  self-organization 
procedure. 

Figure  5(b)(i)  shows  the  results  obtained  for  connectivity-oriented  selection  of 
FT  nodes.  Note  that  the  percentage  of  FT  nodes  decreases  nearly  at  the  same  rate  as 
the  percentage  of  active  nodes.  This  is  because  the  connectivity-oriented  FT  nodes 
selection  algorithm  is  executed  in  a  distributed  manner  and  each  node  uses  only  one- 
hop  knowledge.  Note  also  that  the  percentage  of  FT  nodes  is  lower  than  the  percent¬ 
age  of  active  nodes  determined  by  CCANS.  This  is  because  CCANS  considers  both 
communication  connectivity  and  sensing  coverage  in  selecting  active  nodes.  Figure 
5(b)(ii)  shows  the  results  for  coverage-centric  selection  of  FT  nodes.  Since  the  cov¬ 
erage-centric  fault-tolerance  nodes  selection  procedure  given  by  Fig.  4  has  been 
proven  to  generate  the  minimum  number  of  fault-tolerance  nodes,  the  percentage 
shown  in  Fig.  5(b)(ii)  is  much  lower  than  the  percentage  of  active  nodes  from 
CCANS. 

The  distributed  fault-tolerance  nodes  selection  procedure  contains  two  stages.  We 
consider  two  cases  for  the  implementation,  namely  “FTNS-1”  and  “FTNS-2”.  FTNS-1 
refers  to  the  case  that  the  first  stage  is  the  coverage-centric  selection  of  fault-tolerance 
nodes  (FTNS-1  Stage  1)  and  the  second  stage  is  the  connectivity-centric  selection  of 
FT  nodes  (FTNS-1  Stage  2).  FTNS-2  refers  to  the  case  that  the  first  stage  is  the  con¬ 
nectivity-centric  selection  of  FT  nodes  (FTNS-2  Stage  1)  and  the  second  stage  is  the 
coverage-centric  selection  of  FT  nodes  (FTNS-2  Stage  2).  Figure  5(a)  presents  the 
result  for  the  distributed  FTNS  algorithm.  In  both  FTNS-1  and  FTNS-2,  the  FT  nodes 
that  have  already  been  selected  in  Stage  1  are  checked  first  in  Stage  2  to  see  if  they 
already  provide  enough  sensing  coverage  for  fault  tolerance.  This  decreases  the  num¬ 
ber  of  FT  nodes  needed  for  Stage  2  of  coverage-centric  FT  nodes  selection,  which  is 
shown  in  Fig.  5(a).  Note  that  in  Fig.  5(a)(ii),  the  percentage  of  FT  nodes  in  Stage  1  is 
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(a) 


(i)  Percentage  of  FT  nodes  for  connectivity  vs.  active  nodes 


(b) 


(i)  Percentage  of  FT  nodes  vs.  active  nodes  in  FTNS-1 


Number  of  total  nodes 


FIGURE  5  Simulation  results:  (a)  Percentage  of  FT  nodes:  (i)  FT  nodes  for  connectivity 
only;  (ii)  FT  nodes  for  coverage  only,  (b)  Percentage  of  FT  nodes  for  the  distributed  FTNS 
procedure  (with  both  coverage  and  connectivity  concerns):  (i)  FTNS-1:  Stage  1  selects  FT 
nodes  for  coverage  and  FTNS  Stage  2  selects  FT  nodes  for  connectivity;  (ii)  FTNS-2  Stage  1 
selects  FT  nodes  for  connectivity  and  FTNS  Stage  2  selects  FT  nodes  for  coverage. 
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(a) 


(i)  Coverage  Loss 


(ii)  Number  of  Failing  Active  Nodes 


(iv)  Number  of  Activated  FT  Nodes 
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FIGURE  6  Simulation  results:  (a)  Effect  of  failing  active  nodes  vs.  activated  FT  nodes 
for  FTNS-1:  (i)  Average  coverage  loss  from  failing  active  nodes;  (ii)  Average  number  of 
activated  FT  nodes;  (iii)  Average  coverage  loss  from  failing  active  nodes;  (iv)  Average 
number  of  activated  FT  nodes;  (b)  Effect  of  failing  active  nodes  vs.  activated  FT  nodes  for 
FTNS-2:  (i)  Average  coverage  loss  from  failing  active  nodes;  (ii)  Average  number  of  acti¬ 
vated  FT  nodes;  (iii)  Average  coverage  loss  from  failing  active  nodes;  (iv)  Average  num¬ 
ber  of  activated  FT  nodes. 
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(i)  Grid  Point  Coverage  for  FTNS-1  with  pth  =  0.8 


(ii)  Grid  Point  Coverage  for  FTNS-2  with  pth  =  0.8 


FIGURE  7  Average  grid  point  coverage  when  active  nodes  fail  during  the  simulation. 


the  same  as  the  percentage  of  FT  nodes  at  the  end  of  the  FTNS  procedure.  This  is  due 
to  the  fact  that  we  have  rc  =  2 rs  in  this  scenario.  As  shown  in  [31],  when  rc  =  2 rs,  the 
connectivity  is  automatically  guaranteed  by  the  subset  of  nodes  needed  to  maintain 
the  sensing  coverage. 

Next,  we  simulate  the  failing  of  active  nodes  to  show  that  FT  nodes  are  able  to  pro¬ 
vide  the  coverage  and  connectivity  when  active  nodes  fail.  This  is  shown  in  Fig.  6.  The 
sensor  network  layout  and  configuration  are  the  same  as  those  in  Fig.  5.  We  use  a  simpli¬ 
fied  model  for  generating  the  failing  active  nodes.  For  a  total  simulation  time  of  200  min¬ 
utes,  we  select  a  random  number  of  active  nodes  from  the  currently  alive  active  nodes 
every  10  minutes,  and  assign  them  as  failing  nodes.  The  neighboring  FT  nodes  determine 
that  these  nodes  have  failed.  As  shown  in  Fig.  6,  the  failing  of  active  nodes  leads  to  an 
activation  of  the  designated  FT  neighbors.  The  loss  of  coverage  from  the  failing  active 
nodes  are  compensated  by  the  coverage  support  from  the  activated  FT  nodes.  The  simula¬ 
tion  stops  when  all  active  nodes  have  failed.  Fig.  7  shows  the  change  in  coverage  probabil¬ 
ity  for  grid  points  in  the  sensor  field.  Note  that  the  average  grid  point  coverage  probability 
decreases  with  time.  This  is  due  to  the  fact  that  the  coverage-centric  FT  nodes  selection 
only  selects  the  minimum  number  of  FT  nodes  to  save  energy;  the  goal  is  not  to  maximize 
the  coverage.  However,  at  any  time  instant,  the  coverage  probability  is  always  higher  than 
the  required  coverage  probability  threshold  p,h  =  0.8.  Also  note  that  at  any  time  instant,  the 
connectivity  is  guaranteed  by  the  activated  FT  nodes  and  alive  active  nodes  for  both 
FTNS-1  and  FTNS-2. 


264 


Y.  Zou  and  K.  Chakrabarty 


(i)  Sending  Data  Size  in  FTNS  vs.  Number  of  Nodes 


Total  Number  of  Nodes 


x  104  (ii)  Receiving  Data  Size  in  FTNS  vs.  Number  of  Nodes 


Total  Number  of  Nodes 


FIGURE  8  Communication  data  message  size  in  FTNS  for  FT  nodes  selection. 


6.2.  Discussion 

Note  that  upper  bounds  on  the  number  of  the  fault-tolerance  node  given  in  previous  sec¬ 
tions  are  important  because  they  can  be  used  as  a  guideline  for  the  initial  sensor  nodes 
deployment  to  achieve  fault-tolerant  self-organization.  For  example,  as  shown  by  Fig.  5, 
we  can  deploy  the  number  of  sensor  nodes  that  are  sufficient  enough  to  provide  the 
required  level  of  fault  tolerance  in  the  sensor  network.  The  lower  bound  on  the  number  of 
fault-tolerance  node  is  useful  since  it  can  be  used  as  a  baseline  for  comparing  different 
heuristics.  Note  that  the  problem  of  finding  a  minimum  connected  dominating  set 
(MCDS)  for  a  general  graph  is  a  MV  -complete  and  it  is  hard  to  approximate.  The  original 
work  of  using  MCDS  as  a  backbone  for  routing  by  Bharghavan  and  Das  in  [4]  has  a 
approximation  ratio  of  3//(A),  where  A  is  the  maximum  node  degree  and  H{ A)  is  the  Ath 

Harmonic  number  given  H( A)  =  V  '  - .  A  comparison  of  recent  distributed  algorithms 

1 1=1  i 

for  forming  CDS  described  in  [3,25,28]  can  be  found  in  [3].  In  this  paper,  we  used  the  dis¬ 
tributed  algorithm  proposed  in  [28]  for  its  simplicity  of  implementation.  However,  the 
proposed  fault-tolerance  procedure  in  this  paper  is  not  limited  by  any  particular  heuristics 
for  backbone  nodes  selection  to  form  CDS.  In  our  case,  the  lower  bound  is  not  on  the  set 
of  all  nodes  but  the  subset  of  nodes  that  are  not  selected  as  backbone  nodes,  i.e.,  candidate 
fault-tolerance  nodes.  This  is  referred  to  as  the  constrained  minimum  connected  dominat¬ 
ing  set  problem  in  our  paper.  Therefore,  heuristics  in  existing  literatures  such  as 
[3,4,25,28]  can  be  directly  used  to  obtain  the  approximations  of  MCDS  by  only  applying  it 
on  the  subset  of  non-backbone  nodes. 
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The  proposed  distributed  fault-tolerance  nodes  selection  procedure  is  a  localized  algo¬ 
rithm.  Localized  algorithms  are  considered  as  a  special  type  of  distributed  algorithms  where 
only  a  subset  of  nodes  in  the  wireless  sensor  networks  participate  in  sensing, 
communication,  and  computation  [18],  For  either  stage  in  the  proposed  distributed  fault-tol¬ 
erance  nodes  selection  procedure,  it  requires  only  local  knowledge  and  constant  rounds  of 
communication  for  message  exchange  among  the  neighborhood.  From  the  discussion  in 
Subsection  4.3  and  5.2,  the  total  time  complexity  for  the  distributed  FTNS  is  0(mA  logA  + 
A3).  For  message  complexity,  both  stages  in  FTNS  require  the  exchange  of  a  constant  num¬ 
ber  of  messages  within  the  neighborhood.  The  active  nodes  in  the  backbone  first  carries  out 
the  computation  for  connectivity  checking  and  coverage  evaluation  to  select  a  subset  of 
nodes  from  its  sleeping  neighbors,  then  it  broadcasts  the  list  of  selected  FT  neighbors  within 
its  neighborhood.  The  designated  FT  nodes  need  not  be  activated  until  the  active  nodes  fail. 
In  Fig.  8,  we  show  the  evaluation  of  communication  data  size  for  FT  nodes  selection. 

In  FT  nodes  selection,  active  nodes  send  the  message  containing  the  list  of  node  ids  of 
the  designated  FT  neighbors.  Neighbors  of  active  nodes  search  the  received  id  list  and  set 
themselves  as  FT  nodes  for  that  active  node,  i.e.,  FT  nodes  that  can  be  activated  into  an 
active  state  by  their  failing  active  neighbors.  The  designated  FT  nodes  then  send  an 
acknowledge  message  back  to  the  active  nodes  to  confirm  the  FT  node  assignment. 
Assuming  that  there  is  no  packet  loss,  this  takes  2  rounds  of  communication  within  the 
neighborhood  of  active  odes.  The  message  size  complexity  is  then  0(A).  For  the  activation 
of  FT  nodes,  we  assume  that  designated  FT  nodes  periodically  poll  their  active  neighbors 
about  whether  they  are  still  alive  or  not.  The  polling  frequency  depends  on  the  sensor  net¬ 
work  application  requirement  and  sensor  nodes  failure  distribution,  since  it  should  not 
require  excessive  energy  and  bandwidth.  The  problem  of  determining  the  polling  fre¬ 
quency  is  not  considered  in  this  paper.  Note  that  it  is  possible  to  simply  let  all  sleeping 
nodes  do  the  polling  without  designating  any  fault-tolerance  nodes.  However,  this  also 
means  that  when  an  active  node  fails,  all  its  sleeping  neighbors  have  to  become  active. 
This  adversely  affects  the  potential  of  extending  the  lifetime  for  the  densely  deployed  sen¬ 
sor  network.  Figure  8  shows  the  average  communication  data  size  for  both  FTNS-1  and 
FTNS -2. 

7.  Conclusions 

In  this  paper,  we  have  investigated  fault  tolerance  for  coverage  and  connectivity  in  wire¬ 
less  sensor  networks.  Fault  tolerance  is  necessary  to  ensure  robust  operation  for  surveil¬ 
lance  and  monitoring  applications.  Since  wireless  sensor  networks  are  made  up  of 
inexpensive  nodes  and  they  operate  in  harsh  environments,  the  likely  possibility  of  node 
failures  must  be  considered.  We  have  characterized  the  amount  of  redundancy  required  in 
the  network  for  fault  tolerance.  Based  on  an  analysis  of  the  redundancy  necessary  to  main¬ 
tain  communication  connectivity  and  sensing  coverage,  we  have  proposed  the  distributed 
FTNS  algorithm  for  fault-tolerant  self-organization.  FTNS  is  able  to  provide  a  high  degree 
of  fault  tolerance  such  that  even  when  all  of  these  active  nodes  fail  simultaneously,  the 
coverage  and  the  connectivity  in  the  network  are  not  affected.  The  proposed  distributed 
FTNS  approach  is  scalable  and  requires  only  localized  communication.  We  have  imple¬ 
mented  FTNS  in  MatLab  and  presented  representative  simulation  results. 
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Appendix 

Proof  of  Theorem  2 

Proof.  Since  the  nodes  in  Nk  are  connected,  we  know  from  Theorem  1  that  at  most  10 
nodes  are  needed  to  keep  all  nodes  in  Nk  connected  if  sk  fails.  Moreover,  active  neighbors 
of  sk  are  connected  to  all  other  neighbors  of  sk,  since  Q  =  1,  a  routing  path  from  any  neigh¬ 
bor  inside  communication  region  of  sk  to  any  nodes  outside  sk  is  not  broken.  Therefore, 
communication  fault  tolerance  is  maintained  if  sk  fails.  ■ 


Proof  of  Corollary  1 

Proof.  Since  LS'J  >  1,  VsA.  e  Sa,  Ak  >1-  Since  Ak  >0,  i.e.,  Nk  A  let  us  assume  that 
s,  e  Nk  .  Then  s,  is  in  one  of  the  six  sector  areas  shown  in  Fig.  2(a).  Based  on  the  proof  of 
Theorem  1,  starting  from  sh  we  need  at  most  9  more  nodes  to  keep  all  nodes  in  Nk 
connected.  As  we  do  not  know  the  locations  of  the  active  neighbors  inside  the  sectors  in 


Redundancy  Analysis  for  Wireless  Sensor  Networks 


269 


Fig.  2(a),  it  is  possible  that  all  active  neighbors  of  sk  are  located  close  to  each  other.  In  this 
case,  when  sk  fails,  all  9  FT  nodes  are  needed.  ■ 


Proof  of  Theorem  3 

Proof  Let  N\  c  N{  be  the  set  of  FT  neighbor  nodes  for  sk  in  a  1-DOFT  sensor  network. 
If  \Va\  =  1,  then  Sa  contains  only  one  active  node.  Assume  Sa  =  { sk } .  Then  I  St  1=1  Nk  1=  r k. 
From  Theorem  1,  \/sk  g  Sa,  it  is  sufficient  to  have  r/:  =  10  for  the  fault-tolerant  communi¬ 
cation  connectivity  within  the  communication  region  of  sk.  Furthermore,  note  that  If  IVJ  > 
1,  then  \/sk  g  Sa,  A".  >  1.  From  Corollary  1,  \/sk  g  Sa,  it  is  sufficient  to  have  rA  =  9  for  the 
fault-tolerant  communication  connectivity  within  the  communication  region  of  sk. 
Therefore, 

IS,‘ 1  =  1  U  A^l<  I  \Nlk\=  I  T,<  I  9  =  91  VJ.  (13) 

VsteS0  VsteS0  Vsj.eS,,  VstES0 

However,  note  that  sk  can  share  FT  nodes  with  its  active  neighbors.  Assume  that 
,v(  g  Nk  and  recall  the  six  sectors  in  Fig.  2(a).  Then  is  connected  to  at  least  one  FT  node 
in  the  set  n[  corresponding  to  sk.  Let  this  FT  node  be  denoted  by  Sj.  Therefore,  when  st 
selects  its  own  FT  nodes,  it  also  selects  the  existing  FT  node  s-.  Since  each  pair  of 
connected  active  nodes  share  at  least  one  FT  node,  the  total  number  of  times  that  FT  nodes 
have  been  shared  is  at  least  |  Ea  \ .  This  implies  that  Equation  (3)  over-counts  the  size  of 
S)  by  at  least  |  Ea  \ .  Therefore,  IS)1  \<9\Va  I  - 1  Ea  I-  An  example  is  shown  in  Fig.  9, 
where  s,  e  Nk  and  s-  is  a  FT  node  for  both  sk  and  s{.  ■ 


Proof  of  Theorem  4 

Proof  For  a  Q-DOFT  (I  <il<  A)! )  sensor  network,  consider  an  arbitraily-chosen  active 
node  sk  g  Sa.  To  maintain  connectivity  over  the  entire  sensor  network,  the  FT  nodes 
selected  for  sk  must  first  keep  all  non-failing  active  neighbors  and  sleeping  neighbors  con¬ 
nected;  second,  they  should  also  provide  alternative  routing  paths  for  those  failing  active 
neighbors,  i.e.,  the  network  traffic  going  outside  the  communication  region  of  sk  should 
not  be  impeded.  Since  sk  itself  can  be  one  of  the  LI  failing  active  nodes,  the  proof  is  based 
on  the  enumeration  of  the  following  three  cases. 


O  Fault-tolerance  (FT)  Nodes 


FIGURE  9  Illustration  of  the  proof  of  Theorem  3. 
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Case  1:  If  sk  fails  and  Q  =  A)'  +  1  •  Since  sk  fails,  sk  has  Q.  -  I  failing  neighbors.  Since  sk 
needs  at  most  1  FT  neighbor  for  each  failing  node  to  provide  an  alternative  routing 
path  for  this  failing  node,  it  is  sufficient  for  sk  to  have  Q.  -  1  FT  nodes  for  all  failing 
neighbors.  Furthermore,  consider  the  connectivity  within  the  communication  region 
of  sk.  If  Q  =  A".  +  I,  from  Theorem  1,  it  is  sufficient  to  have  10  FT  nodes  for  fault- 
tolerant  communication  connectivity  within  the  communication  region  of  sk.  This 
leads  us  to  T*  >  Q  -  1  +  10  =  Q  +  9. 

Case  2:  If  sk  fails  and  Q  <  Aak  + 1.  Since  sk  fails,  similar  to  Case  1,  it  is  sufficient  for  sk  to 
have  12  -  1  FT  nodes  to  provide  alternative  routing  paths  for  the  failing  neighbors.  Also 
because  Q  <  Ak  + 1,  there  exists  at  least  one  non-failing  active  neighbor  of  sk.  From 
Corollary  1,  it  is  sufficient  for  sk  to  have  9  FT  nodes  for  fault-tolerant  communication 
connectivity  within  the  communication  region  of  sk.  Therefore,  rA>I2-l  +  9  =  Q  +  8. 

Case  3:  If  sk  does  not  fail.  All  non-failing  neighbors  and  sleeping  neighbors  of  sk  are  still 
connected  because  sk  is  active  and  alive.  Thus  we  only  need  FT  nodes  to  provide 
alternative  routing  paths  for  failing  neighbors  of  sk.  Similar  to  Case  1,  it  is  sufficient 
for  sk  to  have  Q.  FT  nodes  to  provide  alternative  routing  paths  for  the  failing 
neighbors.  ■ 


Proof  of  Theorem  5 

Proof.  Let  Sy  be  the  set  of  FT  nodes  selected  when  nodes  in  ^  fail.  Consider  the  connec¬ 
tivity  of  corresponding  subgraph  Gy  of  Sf.  There  are  three  possible  cases  to  consider:  1)  Gy 
is  connected.  This  corresponds  to  the  case  that  Vs,-,  Sj  e  Sk  (,v,  A  sf.  there  is  a  routing  path 
from  ,v(  to  Sj  in  Sk.  2)  Gy  is  not  connected  and  A,  =  <f>.  This  means  none  of  nodes  in  Sy  is  con¬ 
nected  to  any  other  nodes  in  S;.  3)  Gt  is  not  connected  and  /y  A  0.  This  corresponds  to  the 
case  that  3s;,  s.  g  Sf  (y  ^  sf  such  that  there  is  a  routing  path  from  s;  to  .v;  in  Sy,  and  at  the 
same  time,  3.v, , ,S';  g  Sy,  where  st  A  s,  and  Sj  A  .v;,  such  that  there  is  no  routing  path  from 
y  to  Sj  available  in  Sy  Let  N^1  be  the  set  of  nodes  selected  for  sk  as  FT  nodes  when  sk 
fails.  In  the  following,  we  determine  an  upper  bound  on  |  S,n  I  for  each  of  the  three  cases 
described  earlier. 

Case  1:  If  Gyis  connected.  Since  nodes  in  Gy  fail  at  the  same  time  and  Gyis  connected,  this 
implies  that  \/sk  g  Sy  at  least  one  active  neighbor  of  sk  fails.  Note  that  it  is  possible 
that  all  active  neighbors  of  sk  fail  together  with  sk  itself.  Thus,  from  Theorem  1  at 
most  10  FT  nodes  are  needed  for  sk.  Therefore, 

I  S,Q  I  =  I  U  Nk^  X  I  I  <  X  10  =  10 1 VJ . 

\/skeSa  \/skeSa  \fskeSa 


From  Fig.  2(b)  and  Fig.  2(c),  when  two  connected  nodes  fail  at  the  same  time,  even  the 
smallest  overlap  of  their  communication  regions  contains  at  least  4  FT  nodes,  which 
are  FT  nodes  shared  by  both  failing  nodes.  This  implies  that  for  each  edge  of  I Ea\, 
there  are  at  least  4  shared  FT  nodes.  Therefore,  from  Theorem  4, 

I  Sf  I  <  10  I  Va  I  -4 1  Ea\ . 

Case  2:  If  Gyis  not  connected  and  /y,  =  o.  Obviously  in  this  case,  \/sk  g  Sp  the  number  of 
FT  nodes  needed  for  sk  is  the  same  as  the  number  of  FT  nodes  needed  for  the  case 
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where  Q.  =  1 .  Also,  because  If  =  o,  \/sk  e  Sf,  sk  has  at  least  one  non-failing  neighbor. 
Then  from  Corollary  1,  at  most  9  FT  nodes  are  needed  for  sk.  Therefore, 


IS/I  =  I  U  Nk  ^  X  lN X  9  =  9IFJ. 

Vst  eSf  Vsk  eSf  Vst  eSf 


Case  3:  If  Gk  is  not  connected  and  Ef  A  <f>.  This  includes  non-connected  failing  nodes  as 
discussed  in  Case  1  with  no  shared  FT  nodes  and  connected  failing  nodes  as  discussed 
in  Case  2  with  shared  FT  nodes.  Since  A  cf>,  \  Ef\  >  1,  i.e.,  at  least  two  failing  nodes 
are  connected.  Therefore,  ■ 


\Sf  \<  9(\Va  1-2)  +  (10x2-4)  =  91  Vfl  1-2 


Proof  of  Theorem  8 

Proof  Assume  that  .S',  c  S /  is  the  subset  generated  by  the  procedure  in  Fig.  4  as  the  set  of 
FT  nodes  for  gk  Assume  that  5\  =  Lf  1,  .  .  .,  jf  +  1),  i.e.,  .S’,  is  obtained  at  the  j-th  position 
as  shown  by  in  line  8  in  Fig.  4.  Therefore,  qfLf  1,.  .  .,j,j  +  1))  >  pth  and  qfLfl,.  .  .,/))  <  pth. 
Denote  the  node  at  j  +  1  in  L,  as  sk  ,  i.e,  l(skQ )  =  j  +  I  and  Lf  j  + 1)  =  { .v^.  }  ■  Assume  that 

35-,  <z  S'-  such  that  |  S2  I  <  I  Sj  I  and  qfSf)  >  Since  both  5\  and  S2  are  subsets  of  Sf 
there  are  three  cases  in  the  relationship  between  Sj  and  S2,  which  are  listed  below  as, 

Case  1:  If  S2  cz  Sv  Note  that  Since  5\  =  L((l,.  .  ,,j,j  +  1). 

1  ■  If  Lj  ( j  + 1)  £  Sr  *en  S2  c  L,(l„  . .,;).  From  Equation  (8),  qfSo)  <  qfLf  1,. . .,;))  <pth, 
which  conflicts  with  the  assumption  that  qfSf)  >  pth. 

2.  If  Lfj  +  1)  c  S2,  let  N  =  S0  \  Lj  (y  + 1 ) .  Since  |  S2  \  <  |Sj,  we  have 

S2  cL;(  1, •  •  • ,  j).  Then  e  Lt (1, •  •  • ,  j )  \  S2 ,  such  that  /(^  )  <  l(sk  ) .  Based  on  the 
definition  of  L,,  we  have  pf  >  pf1 .  Therefore,  from  Equation  (8),  we  have 

qi(S2)  =  qi(S2  UL,0'  +  1))  =  qi(S2(j{sko})  <  qfS2  u{^  })  <  qfLf  1, •••,;'))  <  Pth, 
which  contradicts  with  the  assumption  that  q^Sf)  >  pth. 

Case  2:  If  S2  n  S{  =  f.  From  the  definition  of  Lr  S2  c  Lfj  +  2,.  .  .,  |  L,- 1 ).  Without  loss  of 
generality,  let  S2  =  Lfux,  u2,.  .  .,  ua ),  where  j  +  2  <  ul  <  u2  < . -ua-  I L,  \ .  Since 

|  S2 1  <  1 | ,  we  can  construct  a  subset  .S',  of  .S'h  where  I  Sl  1=1  Si  I  and 
5)  =  El (V| ,v2,---, vh )  such  that  v1  <  ux,  v2  <  u2,.  .  .,  vb  <  ua.  Therefore,  from  Equation 
(8).  qfSi)<  qfS[)<  pth-  This  contradicts  with  the  assumption  that  qfSf)  >  pth. 

Case  3:  If  S2  f  .S',  and  S2r\Sx  A  f.  Assume  that  S0  =  S2nS1.  Then,  let  Sj  -S1\S0  and 
S2=S2\S0.  Obviously  5'  c  L;  (j  + 1,  •  ■  ■ ,  I  I)  ■  Since  |  S,  |  >  I  S2  \ ,  then 

I S0  I  + 1  S[  l>l  SQ  I  +  I  Sj  l=>l  5,'  l>l  S2  I 
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1-  If  Li(j+\)(£S1  .  then  L,(j  +  1)^50  .  therefore  L,(j  +  l)cS j  ■  Let 

Si  =  S[  \ Lj (j  + 1)  .  Since  I  s[  l>l  5.  I=>l  s'i  uL,0'  +  l)  l>l  S2  l=>l  Sj'  l>l  S'2  I  - 

Therefore,  we  can  construct  a  subset  5*  c  5'  such  that  |  5*  |=|  s \  I  •  Note  that 
5,*  cL;(V”,i)  and  S2  £  L,(;  + 1,  -  -  - ,  I  L,  I),  therefore  #,■(£*)  >  <fr(S2).  Also  note 
that  ^50  u5[*  j  c  Lj-Cj  +  l,---,!  L,  I)  -  Thus,  from  Equation  (8),  we  have 

<7; 2 )  =  ?,  (S0  u  S2 )  <  qt (S0  u  S* )  <  ql(L[  (1, •  •  • ,  7))  <  pf/! , 

which  conflicts  with  the  assumption  that  ^,(S2)  >  pth. 

2.  If  Lf(/  +  1)  c  S2,  then  L;(/'  +  1)  c  S0  because  S2  £  Lt(j  + 1,  — , I L,  I).  Let 

SQ  =  SQ  \Lj(j  + 1).  Since  |  S',  l>l  s'2  I.  we  can  construct  a  subset  5]  cSj  such  that 

I  Sx  1=1  S2  I.  Further  note  that  Sj  \  Sx  ^  <p,  therefore,  3sk  e  Sl  \  Sl ,  such  that 

l(ski )  <  l(sko),  i.e.,  pf'  >  p*0.  Also  note  that  (SqU^  })  c  L,(l,---,  7')- There¬ 
fore,  we  have 

7;  (S2 ) =  di  (Sq  U  S2  )  —  q,  (S0  U  S2  U  Li  (7  + 1 )) 

<  9/ (So  u5i  1)  <  9,-(L«(l. ••■.;))  <  Pth. 
which  contradicts  with  the  assumption  that  g,(S2)  >  pth. 


From  the  above  discussion,  for  a  given  grid  point  gt,  the  distributed  coverage-centric 
selection  procedure  given  by  Fig.  4  generates  the  subset  of  FT  nodes  with  the  minimum 
number  of  FT  nodes  for  gk  ■ 


